Bram Cohen, the implementor of BitTorrent, has thrown a little bit of a tizzy about Microsoft Research’s Avalanche project. He goes so far overboard in his haste to slam it that I feel a detailed response is called for. The rest is “below the fold” to avoid crowding up my front page too much.

First of all, I’d like to clarify that Avalanche is vaporware. It isn’t a product which you can use or test with, it’s a bunch of proposed algorithms.

So was BitTorrent at one time. I remember it being talked up quite a bit a long time before an implementation was available, so it’s a little hypocritical to criticize Avalanche for the same thing. Some of the original BitTorrent ideas, such as the first half-dozen algorithms for choking/unchoking, sucked – to use Bram’s own term, applied himself to one of those iterations while conveniently forgetting the other missteps. That’s how research works. People come up with ideas, they publish them and talk about them and refine them, and eventually settle on what they feel are the best variants. So it was with BitTorrent; so it is with Avalanche.

I spent most of my talk at stanford explaining why it’s difficult to benchmark, much less simulate, BitTorrent in a way which is useful.

Difficult, yes. Impossible, no. As usual, Bram commits the “smartest boy in class” fallacy: he can’t figure out how to do something, so he concludes it must not be possible. In actual fact many projects have successfully used large-scale network simulations to achieve useful results and insights. Bram’s own ideas were at one time modeled in a way, even if it was only informally and in his head. That plus a simulation model cannot be worse than that alone. Even if only a little insight is gained from the simulation, it’s enough to belie his claim.

I can’t fathom how they came up with this. Researching either the source code or the documentation on the BitTorrent web site would have shown that the real choking algorithms work nothing like this. Either they just heard ‘tit-for-tat’ and just made this up, or they for some odd reason dredged up BitTorrent 1.0 and read the source of that.

…or maybe they read the current BitTorrent documentation, which says that “choking lets each peer use a tit-for-tat-ish algorithm to ensure that they get a consistent download rate.” Yes, simplistic tit-for-tat sucked, and the algorithm described for Avalanche seems equally bad, but the current BitTorrent algorithm could still loosely be described as being in the general tit-for-tat category. Such an error does not justify accusing them of “making it up” nor does it invalidate their entire simulation.

So in their simulations peers have 4-6 neighbors with a strong preference for 4. BitTorrent in the real world typically uses 30-50.

“30-50″ is not some magical constant. It’s what happens to work for BitTorrent, which is a different protocol than Avalanche. Absent any actual explanation of why so many are necessary, or why 4-6 are insufficient, saying that BitTorrent uses 30-50 only betrays a certain narrowness of vision.

They also don’t simulate varying transfer rate abilities, transfer rate abilities varying over time, or endgame mode.

…yet. Maybe they will. Maybe they already planned to. As already noted, BitTorrent didn’t spring fully formed from anyone’s brow either. Give the guys a chance to work through their ideas.

In other words, intentionally or not, the simulation is completely rigged against BitTorrent.

Rigging is always intentional, and that’s a strong accusation. Maybe the comparison to BitTorrent really isn’t valid, but that doesn’t mean anything was rigged.

What error correction can in principle help with is that it the chances that any given peer has data which is of interest to another peer. In practice this isn’t really a problem, because rarest first does a very good job of piece distribution, but error correction can in principle do as well as is theoretically possible, and rarest first is in fact less than perfect in practice.

This is the one piece of Bram’s harangue that actually seems to be responsibly presented, so I thought I’d make note of it.

One thing badly missing from this paper is back-of-the-envelope calculations about all of the work necessary to implement error correction. Potential problems are on the wire overhead, CPU usage, memory usage, and disk access time.

Yes, some back-of-the-envelope calculations would have been nice. However, the rest is just FUD, because not all of the factors Bram mentions are likely to matter. For example, fragments could be kept in a memory-only cache until they are positively determined to be useful, and simply discarded if the cache is too full (after all, the same information can be derived from other fragments). Therefore, disk access would not be the killer that Bram wants to think it is but would actually be irrelevant with respect to fragment processing.

Would such an approach actually work, without causing unacceptable levels of retransmission or recalculation? I don’t know, actually. In fact I kind of doubt it, but I don’t claim that their design will fail because of a factor that I only assumed would matter because that’s the only way I personally could imagine doing it (see previous paragraph about “smartest boy in class” syndrome).

The really big unfixable problem with error correction is that peers can’t verify data with a secure hash before they pass it on to other peers.

It’s a problem, perhaps a big one, but not an unfixable one. If you sign fragments instead of final-form blocks, it’s not a problem at all – though that does increase the number of signatures that must be managed and that could in itself be problematic.

As you’ve probably figured out by now, I think that paper is complete garbage.

As you’ve probably figured out by now, I think Bram’s response to the paper is mostly garbage. He raises a couple of good points, but the invective with which he delivers them just shows that the real problem is ego. Bram thinks he’s the god of swarming downloads, but other people – people he knew – did it earlier and quite possibly better. For years he tried to pretend that a centralized tracker was OK, even though many people (including myself) had told him decentralized would be better and even explored ideas of how to do it. Then Azureus beat him to the punch and my, did his tune change in a hurry. As far as I know, though, both their protocol and his (he just had to do it his way) take inadequate precautions against malicious corruption of tracker information, which is just as serious as the data-signing problem he uses to bash Avalanche. Maybe some of the energy he used to compose FUD about a project he views as competition to his now-commercial interest could have been used to fix BitTorrent instead.