Ever since I touched on the “Gifts of the Magi” problem in Reflections on Code, I’ve wanted to write a more focused technical article about it. I’ve been turning examples and explanations over in my head ever since, and I think I’ve figured out how I want to say it. Let’s start by looking at some examples, both technical and non-technical.

  • You’re walking down a hallway, and you meet someone going the other way. You move left, they move right. You move right, they move left. You step aside at the exactly the same moment they step aside. Eventually you hit on a combination that allows you both to proceed.
  • You’re in the left lane on a highway, coming up behind someone. You wait a few moments, giving them a chance to move over (like you undoubtedly think they should have done already). Eventually you give up and move to pass them on the right…just at the moment that they finally move over to let you pass.
  • Two nodes sharing a communications medium try to transmit at the same time, garbling each others’ data.
  • A filesystem, a volume manager, a block I/O subsystem, and a RAID controller are all trying to optimize their access patterns, but nothing ever actually seems to get better.

What these scenarios all have in common is multiple entities each trying to adapt, but failing because what they’re adapting to is itself trying to adapt in a different (often opposite) direction. In the remainder of this article, I’ll describe in greater detail both how bad this can get and how it can be avoided.

Sometimes, coordination problems only affect perfomance, as in one example from my own experience that actually did involve a block I/O system and a RAID controller. The RAID controller was one of a pair, set up in such a way that if a request came in to the “passive” controller it would be forwarded to its “active”partner. This was obviously less efficient, if the host was connected to both controllers anyway, than having sent it to the active controller in the first place. This suboptimal situation could be detected at either end, and corrected at either end. If the controllers decided to change active/passive roles at the same time that the host decided to change which path it used, though, throughput would be no better than before. In fact, since such changes can be at least somewhat disruptive, performance might be worse than just leaving things as they were. Yes, I have sat and watched while host software and array software chased each other back and forth from one path/controller to another, almost infinitely. It’s kind of funny for a while, then it’s just stupid.

The issues can be worse than just performance, though. One basic design rule for a highly available (HA) system is to minimize the number of system states that must be accounted for during failure recovery, and that number is the product of all the component states. Adaptations such as active/passive role switches usually add one or more states for the components involved. In a system of a mere half-dozen components, with each already having only two operational states, adding even one more adaptation state per component results in the number of system-wide states increasing by an order of magnitude. The need to reduce the number of states is why every successful HA design involves carefully controlling when and how changes to its internal configuration may occur. Otherwise you have a system that not only performs poorly, but is at high risk of failing to meet its primary design goal.

There are two basic approaches to these sorts of coordination problems: hierarchical, and peer to peer. The hierarchical approach, which is really the less interesting of the two, breaks down in a couple of ways. One is whether the “leader” is internal or external to the conflict being resolved. In the hallway example, an internal hierarchical solution might be summed up as saying the most senior person gets to go first. In another common real-life situation, one of the primary roles of a manager (director, VP, CEO, board of directors) is to act as a leader by resolving conflicts between subordinates. The other way hierarchical systems break down is according to whether the hierarchy is fixed or situational. For example, in the highway example the “rules of the road” (keep right, merging traffic must yield) define a form of situational leadership that we call right of way.

The great thing about hierarchical solutions to the coordination problem is that they’re efficient. Once you know who’s in charge, you can just delegate all decision-making to them. In the computing world this is particularly appealing since the candidates for leadership are not tainted by self-interest and often have exactly identical ability to perform the role. If your system already embodies some notion of a leader for some reason, it’s almost certain that your wisest course when you run into some new coordination problem is to take advantage of that leader’s existence. If all that leader does is designate a new “sub-leader” to solve the new problem, that’s fine. Even if the problem that the original leader solved has almost nothing to do with the one you’re solving now, that’s just a minor bit of refactoring. In a relatively small and closed system, protocols for electing a leader and reelecting a new one if the old one dies are so well understood nowadays that there are better ways to spend your design time.

Things get a bit more complicated when it’s not feasible to elect a leader. This often happens in systems with large latencies and/or dynamic membership, where the time to elect (or even find) a leader might be so long that by the time you’re done you’ve already taken too long. In these cases a very different kind of protocol is required – one which resolves conflict between participants without designating one as having priority over the other. For example, consider the shared-medium example given above. Basic networking books (e.g. Tanenbaum) are full of solutions, some hierarchical, but one of the best known is the CSMA/CD protocol once used in Ethernet. The basic idea is that if two nodes could detect a collision (i.e. both trying to transmit at the same time) then they would each try to retransmit. If they both waited the same time, though, they’d just collide again. The key to the protocol is the introduction of a small random factor so that collisions between the same two nodes become successively less likely until one gets through. There’s no reasonable way either could be called a leader, but the protocol works anyway.

In a more complex sort of system, the idea of a “hold” request can be very useful. For example, consider my earlier RAID-controller example. If the host had just been able to tell the RAID controllers about an impending change before it happened, there would be no problem with conflicting changes (and likewise if the RAID controllers had been able to tell the host). Whoever received such a message could simply freeze or reset whatever information base they had used to decide that they needed to make a change themselves, and wait to see how the new status quo worked out. It’s entirely possible that a race could occur between each side telling the other about an impending change, but resolving that race is completely trivial. Just including a random number in each request and allowing the lower number to “win” would be sufficient, and it doesn’t get much simpler than that.

The hold-request approach mostly still works even in more complex multi-layered arrangements, though in any kind of topology that allows loops it can result in strange kinds of fluctuations involving three or more nodes. This is the realm in which routing protocols live, which I consider one of the most complex and subtle and therefore interesting problem domains in computing, but I’d need an article even longer than this one even to scratch the surface. For now, I hope I’ve managed both to explain why simpler types of coordination problems are still important and to provide a few helpful tips on how to address them.