I briefly alluded to this in an earlier post, and figure it’s time for a more complete explanation. Let me just get one thing out of the way first: I am not so presumptuous as to think this can compete with Brewer’s CAP Theorem. It’s almost more of a corollary or elaboration, to clarify certain issues that arise when thinking/talking about CAP. So, what is TAG? It’s Timely Agreement in a Global system. Like CAP, the idea behind TAG is that a system can have only two of the three named properties. Timeliness is pretty intuitive and self-explanatory. Agreement is also intuitive, referring to most of what other terms such as consistency or consensus would mean. “Global” requires a little more explanation. It refers to systems that are distributed not only in the sense of many nodes but also in the sense of many places, where long-lived partitions are a serious concern. Think WAN, not LAN. Far, not near. “Global” does not include distributed systems where latency is low, bandwidth is free, and partitions are rare or transient or both. (We really need two different terms instead of using “distributed” for both, but that’s a rant for another time.) At risk of sounding elitist, I’ll say that distribution within a site is an operational rather than algorithmic problem, or tactical vs. strategic. It’s not really what CAP or TAG are about. So, what tradeoffs does TAG allow?

  • First up is TA – Timely Agreement. Everyone gets quick answers, and they’re the same answers for everyone. This is what a well designed non-Global system is likely to have. It’s also what most people who claim to have refuted CAP turn out to have, because they’re not really thinking about the world that CAP describes.
  • Second is AG, or perhaps GA – Global Agreement. You get consistency, so long as you don’t mind waiting. This is where most people will end up when they just try to run a TA system across sites, without thinking about CAP and such. Locks and timeouts can be made to work in a local environment, but completely fail in a global one. Other approaches such as MVCC offer even better concurrency and fault behavior, but – at least in the forms that would be ideal for local environments – tend to fail at global scale too. Thus, people who just add G often unintentionally lose T in the process.
  • Last is TG – Timely and Global. You get quick answers, no matter where you are or whether partitions exist, but to do that you have to weaken agreement (consistency). Systems in this space tend to be characterized less by at-the-time synchronization, in any form, as asynchronous queues with after-the-fact repair and conflict resolution.

Relating TAG to CAP is an interesting exercise. Consistency in CAP and Agreement in TAG are practically the same thing. It’s not quite right to translate CAP’s Partitionable into TAG’s Global, but it’s close. Where the direct mapping really falls down is CAP’s Available vs. TAG’s Timely. They’re not the same thing at all; performance or time-boundedness has always been the hidden fourth dimension of CAP, and is made explicit in TAG while the often misleading “availability” is pushed into the background. Most AP systems in CAP are also TG in TAG. That’s why I don’t think TAG really stands alone, but is more of a commentary on CAP. You can think of CAP plus Time as a tetrahedron, normally viewed from a certain angle and thus appearing as a triangle. By viewing the same shape from a second angle, though, some aspects might come into better focus. Other projections of the same shape are left as an exercise for the reader.