Most of the filesharing systems out there nowadays are barely better than FTP, which is one of the very oldest protocols on the Internet. Sure, the underlying technology is a lot different, but functionally we don’t seem to’ve advanced very far. Think about it. Most of the filesharing code is based on a model of non-transparent standalone applications, whole-file upload and download, etc. The transparent veneer of search capability doesn’t count for a whole lot compared to Archie/Veronica or even grepping through the ubiquitous 00INDEX.TXT. I was grabbing recipes and programs and images just as productively via FTP twelve years ago as I can do today using Gnutella or Freenet. The only differences are the bandwidth and the content that’s available.

What I’d like to suggest is that we need to set our sights higher. FTP functionality isn’t worth striving for. The ultimate irony is seeing criticism of a project for not being innovative enough, from people who whose functional target is barely an improvement over FTP and who don’t even have that fully working yet. If you want to get serious about innovation, here are some ideas – first from a feature level, and then from a technical level.

  1. There absolutely must be a guarantee of data availability, between the time data is inserted and the time it is explicitly deleted or obsoleted. The system should never drop data during normal operation, and should endeavor to prevent such loss even in the face of common node/link failures.
  2. Directories and folders are nice. Flat namespaces are just so “twenty years ago” and yet, underneath the search veneer, that’s what most filesharing apps have. It should be possible to upload a source tree, for example, and have the filesharing network preserve its structure instead of having to use an archiving tool.
  3. Going one level deeper, I should be able to access part of a file without having to fetch the whole file.
  4. I should be able to update or overwrite a datum in place, at whatever granularity we’re using, instead of having to create a new datum with a new key and keeping track of what obsoletes which.
  5. Transparency is nice. Once I’ve found the aforementioned source tree, I should be able to mount it, browse it and use it just like I do with my local files. Yes, ultimately I’d even like to be able to do memory-mapped I/O on those files.

Note that I don’t address security/anonymity in this list. It’s not because those things aren’t important; au contraire, it’s because those issues are so important and so complicated that they’d double the length of this already-long article. I’m a strong believer in maintaining focus.

So, what technical directions are implied by the above? This is probably going to be even more controversial than the previous list, but here goes:

  1. The filesharing code needs to be implemented as an actual filesystem, not a separate application. On Windows, with whole-file access, an Explorer namespace extension would do. One implication here is that Java is not an option. Neither is Perl, neither is Python, etc. Get over it. I’m a Python lover myself, and I’ve actually prototyped a signifiant part of such a filesystem in user-space using Python myself, but I know that sooner or later it will have to turn into C and go into the kernel.
  2. We need directory (and possibly attribute) operations as well as file operations. Data location should be some combination of searching and directory traversal.
  3. Data storage, access, and transport needs to be block-centered, not file-centered. The blocks may well be identified as tree:file_path:offset, but for the most part each block is still moved, tracked, and acted upon throughout most of the system as an independent entity.
  4. Data caching throughout the network is absolutely necessary for performance. If we’re caching and we allow in-place modification, we need cache coherency. Been there, done that, c’est la vie.

Phew. That’s a lot of work, isn’t it? One might well ask if it’s worth it. The end result will be a lot less like the filesharing apps we know today than like a full-blown distributed filesystem. That’s kind of the point. We should be using advances in technology – and there have been quite a few – to target a high functional level, not a low one.

If somebody did all this (I’ve already contributed to one major piece, and am working on another), it opens up some interesting possibilities. Such a system would be useful not just for exchanging music and porn pix with your buddies, but also as an important piece of infrastructure for real applications. Why get involved in complex presence detection and messaging to communicate, when you can put information in a shared file? Right now there’s this weird split between local and distributed applications, with the former often using shared files and the latter using complex messaging. Clearly the application developers would prefer to use shared files, and the only reason they don’t do it for distributed stuff is because the infrastructure isn’t there. Either the functionality’s totally stone-age, or the performance and robustness are substandard. Give them a real way to share files without those limitations, and a whole bunch of things that are currently very difficult will become much less so.