I actually started composing this in my head before Matt Reilly’s post about Green Computing, but it kind of fits in with some of the points he makes about communication speed as a factor in overall performance. One of the properties of a Kautz graph, such as we at SiCortex use in our systems, is that

For a fixed degree M and number of vertices V = (M + 1)MN, the Kautz graph has the smallest diameter of any possible directed graph with V vertices and degree M. (from the Wikipedia article)

UPDATE 2008-11-10: I happened to read this while looking for something else, and realized that there are better torus routing methods than those I had considered. I think I’ve invented an even better one than the current general practice, but for now I’ve updated the text below to reflect the general-practice numbers (which don’t affect my argument at all).

Yeah, lovely, what does it mean? What it means is that if you want to make a system of a certain size, let’s say approximately 1000 nodes, and you’re constrained as to how many links per node you can have, you can achieve the smallest network diameter by arranging your nodes in a Kautz graph. For example, our 972-node system using three outbound links per node (i.e. degree=3) has diameter=6. That’s the maximum number of hops to get from a node X to any other node Y; the average is approximately five. By contrast, for a 10x10x10 hypertorus also using three outbound links per node, the diameter would be 27 18 and the average hop count would be over 13 almost 10. I’m not actually familiar with the proof for the statement quoted above and wouldn’t understand it even if it were shown to me, but it seems very believable based on my experience. Every time I try to think about tweaking some other topology to reduce the average hop count or bring the links per node down to reasonable levels (you can’t feasibly build a system that requires dozens of links on every node) the result starts to look more and more like a Kautz graph in terms of routing, partitioning and tiling characteristics, etc.

So far, so good, but how does that translate into anything useful? Well, if communication speed is a factor in overall performance, and communication speed falls through the floor when the system’s communication capacity is exceeded, then that capacity matters a lot. The common way to compare the communication capacity between different systems is to compare bisection bandwidth – how much capacity must be removed to divide the machine into two equal parts? It turns out that measuring the bisection bandwidth of a Kautz graph is not straightforward, and even the Kautz cognoscenti at work seem to disagree about the result. Personally, I think the whole bisection-bandwidth approach is bass-ackwards. Why measure the negative (minimum capacity you could remove) instead of the positive (maximum capacity you could use)? I think it happened because a naive attempt to measure maximum usable capacity would allow a system to be partitioned into small islands communicating amongst themselves, yielding inflated figures that don’t match what a real application running across the entire system could get. That’s fixable in a measure-the-positive approach, though. All you need to do is say that you’re going to measure the maximum bandwidth when each node in one half of the system sends equal amounts of traffic to each node in the other. That way, if you want to combine the figures for two islands (often corresponding to two switches or cabinets) then you have to pass half of your traffic between them, and what you get is a pretty fair approximation of what an actual full-scale application could get.

This is where network diameter and particularly average hop counts start to matter. The maximum usable communication capacity of a system is the aggregate link speed divided by the average hop count. Therefore, even if you can distribute your traffic perfectly across all links in the system for the test I propose above, your performance ceiling will be determined by the average hop count in your topology. For example, in our 972-node Kautz graph that’s 2916 links’ worth divided by an average hop count of five, or 583.2 links’ worth total. For that 1000-node hypertorus, it’s 3000 links’ worth divided by an average hop count of 13.5 10, or only 222.2 300 links’ worth for a similarly sized system. Looked at another way, your links would have to be about 2.6 times almost twice as fast just to keep up. If you made those links bidirectional you’d double the number of (half-duplex) links and reduce your average hop count to around 7.5, so you’d be getting 800 links’ worth of total capacity, but then you’d be talking about a system with vastly greater hardware complexity and cost than the others. If you wanted to compare apples to apples, a thousand-node Kautz graph with degree=6 would have the same 6000 links with an average hop count around 3.5 so the hypertorus would still be at a serious disadvantage. Then you have to consider that we’re talking about ceilings here, and that even in the abstract different topologies will allow for different levels of “perfection” as regards distributing traffic among all the links in the system. Maybe I’ll explore some of those issues at some time in the future. For now the key point is that, much as a more efficient algorithm will always eventually win out over a faster implementation, a more efficient topology will always eventually win out over raw bandwidth. In my opinion, bisection bandwidth as usually measured doesn’t tell that story well enough.