Consistent hashing is one of those techniques that has practically become standard in many high-scale environments, and yet somehow remains little known in the broader community. To underscore the first point, one could point to Facebook/Twitter/Digg using Cassandra, or to many memcached deployments using Ketama enhancements to memcached. To the second point, I’ve more than once mentioned consistent hashing to architect-level folks who, despite being generally informed in many areas, indicated that they’d never even heard the term before (let alone appreciated its significance). Weird. Anyway, one of the best descriptions I know of is by Tom Kleinpeter. Instead of trying to repeat what Tom says, I’ll pick up where he leaves off.

The first additional point I’d like to make is that the circle method is not really the only way to do consistent hashing. All you really need is some way to map both keys and servers into some space, and define some metric of distance within that space. It just happens that the circular method has many advantages, such as making it easy to find the closest server to a particular key using simple range checks and making it easy to find the second closest (third closest, etc.) server to provide redundancy. On the other hand, other methods such as the XOR metric used by Kademlia have properties that are desirable in a DHT. As always, the precise tool must be matched to the job at hand, but for the common case of a LAN-oriented cache or key/value store I think circles are a good choice.

Next, I’d like to point out that the ability to have a single node show up at multiple points in a ring is incredibly powerful. Tom presents this as mostly a load-balancing trick, but its real strength lies in what happens when a node fails. If each node appears only once on the ring, then when it fails its immediate predecessor assumes 100% of its load. However, if it appears four times on the ring, then when it fails its load will be distributed across four predecessors. There’s a catch, though. It’s easy to fall into the trap of having a node appear N times by giving it N keys equally spaced around the ring. What happens to a node X using a common value of N? Each of its N keys is likely to be immediately preceded by a key for the same node Y, which will then end up absorbing 100% of the load if X should fail. This is easily avoided, e.g. by using X’s “main” key as a seed to a pseudo-random number sequence which is then used to vary the placement of its remaining N-1 keys, but it’s still a problem evident in many consistent hashing implementations.

Lastly, I’ll add another warning: consistent hashing doesn’t account for correlated failures. Like any randomized resource assignment, there’s a certain probability that both the closest and the second-closest server keys for a particular data key will be in the same failure domain – e.g. nodes in the same rack. More precisely, this will happen with probability 1/rack_count. You could try to manipulate key assignment to avoid this, but it’s likely to be a mess especially if you’re also trying to avoid the “failed node’s load all goes to the same backup” problem discussed in the last paragraph. If you’re the sort of person who actually cares about making sure replication occurs to a different rack, you’re probably better off with a two-ring solution – one ring to determine rack assignment and one to determine server-within-rack assignment – instead of trying to “trick” a single-ring setup into doing the right thing.