Stonebraker and the CAP Theorem

Michael Stonebraker has lobbed a bomb into the NOSQL camp. He makes a perfectly good point that NOSQL has less to do with SQL than with ACID – a point I’ve tried to make many times myself, and which is reflected in the recent adoption of “Not Only SQL” as a preferred anti-acronym. Then he goes off the rails a bit by talking about scalability while showing no awareness of the CAP Theorem. The CAP Theorem is a classic triangle; you can have any two of Consistency, Availability, and Partitionability. Availability in this sense doesn’t mean eventual availability any more than consistency means eventual consistency; it means the timely availability of data when you ask for it. Similarly, partitionability doesn’t just mean that there are multiple nodes (that’s just scalability) but that some might be unreachable. I’m sure Stonebraker knows all this, but for some reason he seems to ignore it as he argues that you don’t need to sacrifice consistency for availability. He’s right . . . if you’re willing to sacrifice partitionability instead. He almost makes this connection when he points out that auto-sharding databases do exist and that many NOSQL stores do have distributed operation (i.e. partitionability) as a focus, but seemingly fails to appreciate the significance of that juxtaposition within a CAP conceptual framework. Yes, an auto-sharding database built with consideration for all of his four performance-limiting features can maintain C and A, but that was an obvious result already and it’s not relevant to the large number of NOSQL systems which are about A and P. (Some systems do concentrate on C and P, but they seem much less common or fashionable right now.) His supposed example is flawed not only by that, but also by the fact that of course an in-memory database will get great pseudo-TPC numbers. Unfortunately, real TPC results require durability and only a minority of serious real-world datasets will economically succumb to memory-based approaches unless and until CPU/memory/bandwidth balances change more than they have or are likely to. “Hooray” for those who can benefit, “who cares” for everyone else.

In the end, I think Stonebraker’s attempted criticism of NOSQL actually validates it. The flaws in his own “have your cake and eat it too” analysis show that CAP is very real, and that NOSQL folks are making a very reasonable choice when they choose to abandon C. It’s not the only reasonable choice, not even for the environments where those solutions are often developed (see Yahoo’s PNUTS for yet another perspective on the tradeoffs involved), but the people making it do know what they’re doing. Why can’t we all just get along?

Parade of FAIL

One of the items that popped up in my morning scan of the news was a list of top failures in computing. The list is a bit of a FAIL itself, so I’ll continue the train of thought here. The thing about failure is that it can be instructive – often more instructive than success, which probably says something interesting about human psychology but I don’t know what. In fact, many of the things I’ll mention here aren’t really failures in the sense that they were misguided or doomed from the start. Some of them were seminal ideas and great successes in their time, but times changed and now it’s time to move on. Here are some of my suggestions, but first I have to make some exemptions because otherwise the list will just be too big.

  • Networking has generated a particularly impressive string of dead bodies – DECnet, NetWare, OSI, X.25, ATM, FDDI, token ring, on and on and on. How many routing protocols have risen and then fallen again? How many TCP speed tweaks and congestion-control methods? How many autoconfiguration and NAT-traversal methods? How many firewall technologies? That will just have to be a separate list.
  • I’m not even going to start on web/dot-com failures. There are whole blogs, much busier than this one, devoted to that.

On with the real list, after the fold.

Genetics Gets More Complicated

This story about microRNA is not only highly significant, but it’s also a wonderful tale of persistence and collaboration and everything that’s good about the scientific community. On top of that, it’s well told by the author, so I had to share.

Ambros’s work on that bizarre mutant provided one of the first signs that RNA might be much more important than anyone had suspected, but not until 2001 did the full story start to unfold. That is when studies finally convinced scientists that the minuscule RNA snippets they had taken to calling “microRNA” were regulating cellular and genetic processes throughout the human body and were critical factors in the determination of health and disease.

Another explanation is that, as with any remarkable scientific discovery, finding microRNA required just the right combination of talent, circumstance, and luck. Ambros found a perfect collaborator in his wife, Candy Lee, who was a lab technician. As Baltimore describes them (having worked with both), they follow the data rather than the scientific fashions; they are both technically adept in the laboratory; and “they have never been ambitious to the point of its getting in the way of reality.” This is not to say that they lack the drive to do good science, but that “they’re not worrying about the trappings of science,” Baltimore says.

They even recruit one of their kids to do some computer programming for the project at one point. How cool is that?

New Records

Last month, this site set some new records. The most obvious one is that WordPress stats report 8,763 visits for October, vs. 8,701 for previous record holder June. AWStats shows a slightly different picture, though, in particular that September was already beating June in some categories. Here are some highlights:

  • 26,954 visits (September 23,868; June 22,969)
  • 13,524 unique visitors (September 11,609; June 11,950)
  • 69.992 pages (September 62,572; June 57,516)

What I like best about these results is that they’re not the result of a single huge spike like I got for my “Unemployed!” article back in June. (Yay, I’m no longer most famous for having lost my job!) Sure, I wouldn’t have set a record if not for the scary-pumpkin traffic, but there was a pretty consistent level of traffic all through the month. My posts on software cargo cults, consistent hashing, and particularly comparing key/value stores all generated mini-bumps of their own. It’s nice to know not so much that people are reading this site, but that the ideas I’m writing about (most of which I got from elsewhere) seem to be the subject of increased interest. I’m the surfer, not the wave. I know that, and I know these are also trifling numbers compared to the people really generating the wave, but it’s still nice to feel like this site is part of something and not just another personal waste of time. I’m wasting lots of people’s time now!

ZFS gets dedup

Looks like Bonwick and Co. have been up to some neat stuff. I’m not even going to be snarky this time; it even looks like they’ve thought through the hash-collision tradeoffs, made some pretty reasonable choices, and refrained from the overzealous claims that marred the ZFS and FISHworks introductions. Nice work, guys!