Stonebraker and the CAP Theorem

Jeff Darcy November 5, 2009 12:57

Michael Stonebraker has lobbed a bomb into the NOSQL camp. He makes a perfectly good point that NOSQL has less to do with SQL than with ACID – a point I’ve tried to make many times myself, and which is reflected in the recent adoption of “Not Only SQL” as a preferred anti-acronym. Then he goes off the rails a bit by talking about scalability while showing no awareness of the CAP Theorem. The CAP Theorem is a classic triangle; you can have any two of Consistency, Availability, and Partitionability. Availability in this sense doesn’t mean eventual availability any more than consistency means eventual consistency; it means the timely availability of data when you ask for it. Similarly, partitionability doesn’t just mean that there are multiple nodes (that’s just scalability) but that some might be unreachable. I’m sure Stonebraker knows all this, but for some reason he seems to ignore it as he argues that you don’t need to sacrifice consistency for availability. He’s right . . . if you’re willing to sacrifice partitionability instead. He almost makes this connection when he points out that auto-sharding databases do exist and that many NOSQL stores do have distributed operation (i.e. partitionability) as a focus, but seemingly fails to appreciate the significance of that juxtaposition within a CAP conceptual framework. Yes, an auto-sharding database built with consideration for all of his four performance-limiting features can maintain C and A, but that was an obvious result already and it’s not relevant to the large number of NOSQL systems which are about A and P. (Some systems do concentrate on C and P, but they seem much less common or fashionable right now.) His supposed example is flawed not only by that, but also by the fact that of course an in-memory database will get great pseudo-TPC numbers. Unfortunately, real TPC results require durability and only a minority of serious real-world datasets will economically succumb to memory-based approaches unless and until CPU/memory/bandwidth balances change more than they have or are likely to. “Hooray” for those who can benefit, “who cares” for everyone else.

In the end, I think Stonebraker’s attempted criticism of NOSQL actually validates it. The flaws in his own “have your cake and eat it too” analysis show that CAP is very real, and that NOSQL folks are making a very reasonable choice when they choose to abandon C. It’s not the only reasonable choice, not even for the environments where those solutions are often developed (see Yahoo’s PNUTS for yet another perspective on the tradeoffs involved), but the people making it do know what they’re doing. Why can’t we all just get along?

3 Responses to “Stonebraker and the CAP Theorem”

  1. Seanon 18 Nov 2009 at 12:42 am

    Great article!

    An interesting point that Justin Sheehy (@justinsheehy) made in his talk to no:sql(east) was that sacrificing consistency doesn’t mean you have license to lose data – just that you may not have the same version of data at all points in your system at all times. I think “data loss” is what people deep in OnlySQL think when they hear “eventual consistency”. He also told me in a private conversation that in the case of Riak and some other Dynamo-inspired stores, “eventual consistency” means milliseconds – not seconds or minutes.

    Also, he points out that CAP is not “pick two”, it’s “know your tradeoffs”. Riak and Dynomite are interesting because they let you choose which tradeoffs to make and tune the system to your application’s needs. Riak even lets you make some choices at runtime.

  2. Jeff Darcyon 18 Nov 2009 at 7:45 am

    Good points, Sean. I shouldn’t have said “abandon C” above; having made the same point myself in multiple presentations, I should know better. Thanks for the reminder.

  3. Nathan Fiedleron 20 Nov 2009 at 12:27 pm

    Thanks for writing this post. Since reading Stonebraker’s biased article I’ve wanted to write my own critique. What’s interesting is how he avoids any specifics and clearly ignores the flip side of the coin: the benefits you gain when making different choices. Dynamo (and it’s ilk) and Bigtable/HBase have a lot to offer and Stonebraker didn’t address that at all.

    It should be noted that Stonebraker has a vested interest in spreading the FUD on NoSQL.

Comments RSS

Leave a Reply