Stu Charlton has written an interesting post called Confused CAP Arguments. The tone is mostly civil and reasonable, though it’s a bit disappointing to see some of the negative stereotypes (“cocktails…newly-graduated devs”) and strawmen (“only using CAP”) repeated yet again. Oh, well. The issue of “is CA possible” seems to be rather central to Stu’s post, and it also came up in my last post in the subject, so I’ll delve into that aspect a bit and even share some of the war stories that I promised to.
First, though, a bit of background. Back in 1993, before either Stu or the current generation of anti-RDBMS hipsters entered the workforce, I was working on HACMP/6000 at Clam. At first I worked on the lock manager, which can be thought of as a key/value store where the values are very small but have complex semantics. A lot of the hashing and replication techniques it used would be instantly recognizable to NoSQL folks. Later, I was responsible for the network routing and monitoring infrastructure. The “DGSP” (Die, Gravy Sucking Pig) message that we used to enforce quorum was my invention. I became pretty intimately familiar with all of the ways such systems could fail. In particular, I became familiar with how even systems with redundant network hardware and cutting-edge network software could still experience network partitions. To address one of Stu’s points, about treating network failures as the only kind, one observation I made pretty early on is that for all practical purposes they are the only kind. It’s not hard to promote any local failure to a node failure, and it greatly simplifies the rest of the system if you do. Anyway, the real point is that I might not have been a database guy myself, but I directly supported them (including time on-site at Oracle HQ) so I have very little patience for database guys trying to pretend that only they have the proper context to understand the application or business value of various tradeoffs in this area. IBM wouldn’t have sold tens of thousands of HACMP licenses if we hadn’t thoroughly understood those same issues. So, back to the issue at hand. Here’s one of the less coherent parts of Stu’s post.
What is Partition Intolerance?
This is never well defined. Like lactose intolerance, the results, can be, err, unpredictable. It seems to imply one of: complete system unavailability, data inconsistency, or both.
First, data inconsistency is a sacrifice of C. That’s pretty clearly defined. Gilbert and Lynch defined P thus: the system continues to operate despite arbitrary message loss. That’s still a bit vague both in terms of “the system” (all or part) and “continues to operate” (as before or degraded somehow). I tried to provide a clearer interpretation a while back, distinguishing P from A by contrasting the effect of a partition on requests vs. nodes. In short, a non-A system fails some nodes, while non-P system fails some requests. Some might disagree with that definition – several did in the comments to that article – but I think it’s clear enough to make “never well defined” untrue.
What Stu does get absolutely right is that the effect of a partition in a partition-intolerant system can be unpredictable. This is why I don’t like CA systems much. If you’ve partitioned your data somehow and some nodes become unreachable, then some requests will be unable to reach the data (or acquire the locks) that they need, and a CA system must either block or fail such requests. If it could satisfy them by eliminating possible sources of conflict, e.g. by forcing nodes down and breaking their locks, then it would be a CP system (with reduced capacity). If it could satisfy them by allowing the conflict and hoping to perform some sort of resolution later, it would be an AP system (with reduced consistency). As a CA system, preserving both capacity and consistency, block or fail are the only options. Failing requests is the better option, but is unfortunately rarely chosen because people don’t usually seem to choose CA. Instead, they fall into it because they try to deny that partitions could occur. They have unbounded faith in the efficacy of redundant networks. Anybody who has seen the dark underbelly of network infrastructure knows how misplaced that faith can be. In “CAP-naive” systems, the most common behavior I used to see was that some requests would succeed and others would block indefinitely waiting for some resource they needed. It’s bad enough that hash-based resource distribution would cause these blockages to be completely unpredictable, but there’s an even bigger iceberg to worry about: cascading failures. Database requests often need many resources, and if they block unexpectedly while acquiring the fifth then they block while still holding the other four. That would often cause even more requests to block waiting for those four, and so on until practically every resource is unavailable and the whole system locks up. This isn’t theoretical. I’ve seen it happen many many times. I’ve seen it happen almost instantaneously when a critical resource becomes unreachable in a busy system, and I’ve seen it unfold slowly over hours or even days when a less popular resource did so in a lightly loaded system. It’s not some exceptional exotic behavior in CA systems. It’s the norm when partitions occur – and they do.
OK, so this can happen. Is it worth designing around? Implementing redundant networks can reduce the frequency of partitions to very low levels. That means some additional cost for network hardware and software, but without replication to ensure reachability (on top of potentially hardware-assisted replication to ensure durability) the savings on node and storage hardware to reach fixed capacity/performance goals might be even greater. Let’s just eat the cost on those rare occasions when the unthinkable happens. Doesn’t that make more economic sense? The first argument against that is that partitions even in redundant networks are more common than most people think, so the economic cost of such failures is likewise greater. I think the more compelling argument, though, is that a better tradeoff is practically always available. Any CA system which degrades messily into zero capacity when a partition occurs can be easily replaced with a CP system which degrades cleanly to half capacity under the same conditions. The technology for quorum-enforcing CP systems is mature and readily accessible, the configurations and running costs are exactly the same, so there’s no good reason to implement a CA system instead. Every CA system you see is the result not of diligent design but of (often willful) ignorance. CA systems can exist; they just shouldn’t.
I’ve long since decided that the distinction between CP and CA is non-sensical. As many others have said, “pick two” is really more appropriately saying “given that partitions happen, pick C or A”.
But maybe you’re saying something new? A CP system that “allowing the conflict and hoping to perform some sort of resolution later” isn’t much of a “C” system. How is this different from “AP”?
You point out that systems exist that allow you to do quorum operations when possible, falling back to non-quorom operations under failure scenarios. This is also related to Brewer’s “harvest” concept. One issue with this is that the set of systems that allow both quorum and non-quorum ops does not overlap at all with the set of systems that allow atomic operations on two keys/documents/whatever. Effectively, this all applies to the “C” in CAP, but not the “ACI” in ACID.
If you need ACID transactions across “things”, then trying to cram those into current AP databases is an unpleasant thing, perhaps more unpleasant than the loss of uptime. I think that is the point that Stu was trying to make. Yes, log-shipping is common and is effectively AP-style replication, but they Local (C)AP and WAN C(AP) is a nice hyprid model that has had a lot of success for these kinds of apps.
If you don’t need ACID, then there are a lot more options for you, AP systems included.
A CP system that “allowing the conflict and hoping to perform some sort of resolution later” isn’t much of a “C” system. How is this different from “AP”?
It’s an AP system, not CP, which is why I identify it as such in the same sentence that you partially quoted.
If you need ACID transactions across “things”, then trying to cram those into current AP databases is an unpleasant thing, perhaps more unpleasant than the loss of uptime.
Absolutely. That’s why there’s a place for CP systems – like VoltDB, for example.
Local (C)AP and WAN C(AP) is a nice hyprid model that has had a lot of success for these kinds of apps.
I assume that you mean the thing in parentheses in each case is absolute, and the rest subject to being traded away. If that’s the case, then yes, I completely agree, and in fact if you look over on Stu’s site you’ll see a comment from me (preceding this post) that expresses that agreement in almost the same terms that you just used. CP within a site plus AP between sites is not just common; I’d say it’s even a dominant paradigm for the distributed enterprise.
I should probably do this as an update, but I’m too lazy. Astute readers might have noticed that, in the comments to Stu’s post, I say that CA is a valid alternative. Here, I say it shouldn’t exist. The part about “CP in drag” is the key to that apparent contradiction. CA as an application-level programming model is fine, if it’s supported by a CP infrastructure that does quorum enforcement and failover and such for you. What’s not fine is “true CA” unsupported by any such infrastructure and completely vulnerable to the cascading-failure scenario I outline above. That’s the type of system that shouldn’t exist outside of a developer’s sandbox. I apologize for any confusion that my failure to make that distinction earlier might have caused.
Using your definitions, isn’t it fairly easy to transform a CA system into a CP system? Just make nodes capable of determining whether they are part of a quorum, and have them automatically fail if not.
In fact, I proposed such an enhancement a few days ago to someone who is building a CA system. But I wasn’t trying to convert his CA system to a CP system, but rather to ensure that his CA system behaved sanely in the face of partitions. Reading your post, I end up thinking that the distinction between CA and CP is fairly academic.
Exactly, David. There’s some complexity in detecting failures etc. but turning a CA system into CP by enforcing quorum is very well understood territory. Between that and the ugliness of CA behavior when the (by definition) unanticipated partition does occur, it’s hard for me to imagine a legitimate case for staying CA. I don’t think there is any such case where requests block instead of failing when it needs a resource held by an unreachable node, so the failing-requests model has to be applied rigorously throughout the system. In addition, it has to be a case where it’s essential that every non-failing node must remain up to serve those requests which still can be served on the reachable part of the network. That can only be the case if clients cannot reconnect elsewhere, which implies very limited client capability. The only common use case I can think of that meets these requirements so far might be a distributed point-of-sale system where each client contacts to a regional DC. In that case I think the “AP over CP” approach would still be preferable, especially since the fallback for any request that the regional DC can’t handle would be “manual AP” anyway. What’s “manual AP”? That’s writing down credit-card details and reconciling later, as happened to me recently when my wife’s car had to be towed and the towers’ POS device wasn’t working. Even that case doesn’t seem to be a good fit for CA, and that’s the closest I can think of.
So I don’t think the difference is entirely academic. Like building bridges, the difference between a way that works and a way that doesn’t is very relevant in the real world. I would say it’s not the difference that’s academic but CA itself. It’s a mere curiosity, useful to round out and illustrate the CAP problem space, and it’s good for no other purpose. The only problem is that people keep *implementing* that curiosity, harming their users, and folks like Mike/Daniel/Dennis/Stu keep giving them material with which to rationalize their lazy or ignorant choices. I think that would be malpractice if our profession had such a concept, and that’s why I oppose it so strongly.