original article

I hope that in a not so distant future we can say that data dropping from Freenet is more of a theoretical problem than a real one.

I'm sorry, but Freenet has a long way to go protocol-wise before loss of data ceases to be an issue. I'll try to explain why I say that, but it might take a while so please bear with me.

First, I should point out that I also work in the area of data distribution, and have been doing so since long before Freenet existed. In fact, a system I designed about four years ago had a lot in common with Freenet in terms of caching, though many other aspects were quite different. It's because of that similarity that I've kept a close eye on Freenet.

That brings me to my next point. Ian is probably pretty pissed at me right now, so I feel compelled to point out that I think Freenet is great. It's just unsuitable for some purposes, and IMO has received a disproportionate share of attention relative to many other projects that are also great. Ian has shown far more talent for making promises than for delivering on them, and that's a risk for people like me who work in the same general area. I've seen whole technical areas poisoned and neglected for years because of the disillusionment among investors and would-be deployers who got burned when one early project's hype outran its reality. I don't want to see it again, and Ian seems to be doing his best to ensure that it does happen. That simply pisses me off.

OK, on with the show. Why is it unlikely that the issue of data loss will go away for Freenet? First, it's important to note that anything short of a guarantee that data will stay in the system is worthless. People who might otherwise run serious applications on top of a storage system will not find it acceptable if there's *any* realistic chance that data will be dropped during "normal" operation and even in the face of common failures. Rant and rave and wallow in denial all you want, that's just the way it is.

So, if you want to provide a *guarantee* that data will remain in the system between the time it's inserted and the time it's superseded by new data or explicitly removed, you have two choices. One is to give it one or more authoritative "home" locations, and treat all other possible locations as caches of what's in those home locations. That's just pretty much impossible to reconcile with Freenet's anonymity and non-censorability goals.

The only other option is to have each node that caches data know - at the very least - how many other copies there are, so that it doesn't throw away the last one. In practical terms, you pretty much need to ensure that at least two copies remain in the system, to guard against simple failures. Maintaining this information - and it needs to be both accurate and timely - is possible but quite difficult. It becomes more difficult as nodes become less reliable, and it becomes even more difficult if you want the system to run efficiently without getting bogged down by coordination traffic. That anonymity thing also tends to get in the way a bit. It's quite possibly doable, I can almost see the algorithms and protocols in my head because I've worked on similar ones myself, but they're quite different than the ones Freenet currently uses. Even then, the complexity of the result might well exceed the bounds of maintainability (that wall's a lot closer than people think, in distributed systems) and/or the performance of the result might not be acceptable.

There, at long last, you have it. For all of the reasons described above, I think loss of data will always be a problem in Freenet and derivative systems - a real problem, not a theoretical one, and one that makes it unusable for some purposes. To overcome that, Freenet would have to change so much that it would be unrecognizable. I don't even think it's a valid goal for Freenet. Freenet should continue to be developed for the niche toward which it has always been targeted, and for which it is quite well suited. Other solutions should be found for other problems, and none should create credibility risks for the entire field by claiming to be all things to all people.