(This is a cut-and-paste of a comment I made on Dennis Forbes’s Getting Real about NoSQL and the SQL-Isn’t-Scalable Lie – found via MyNoSQL – in which Dennis responds to others’ pro-NoSQL silliness by engaging in some silliness of his own.)
I was going to try to be polite, until I saw your slam against All Things Distributed as a NoSQL advocacy site. What rubbish. Vogels and company know more about scalability than just about anyone, and more about using the right tool for the right job – which is why they provide RDS as well as SimpleDB. Even if that weren’t the case, what you’re engaging in is mere ad hominem. Vogels’s definition of scalability is right or wrong on the merits, not based on who he is or what other opinions you attribute to him. Might as well dismiss all of *your* definitions and claims based on your being an RDBMS advocate. As it happens, I was making Oracle scale before there was a NoSQL, before there was even RAC (it was OPS back then), and from then until now I’ve always used a very similar definition of scalability: maintaining a ratio of work done to resources used. For you to offer a different definition *is* the same sort of self-serving wordplay which you criticize in others.
Second point: equating “highly interrelated” with “relational” doesn’t do justice to either, and characterizing social-media workloads as “largely unrelated islands of data” is just laughable. Friend relationships and recommendations and such create a much *higher* level of data linkage at social-media sites than at banks which are your other example.
Third point: your notions of I/O performance are way off. At the low end of the scale, you claim only 30MB/s for some of the larger instance types. I’ve personally measured 3x that on smaller instance types, and also observed that EC2 is notably bad in terms of disk I/O relative to other kinds of performance. I’ve written about that, as have others such as Randy Bias at Cloudscaling. If you want to get a real feel for I/O performance, you’d be better off measuring some of the more powerful instances at Rackspace or GoGrid, or even better for current purposes would be direct measurement of RDS at Amazon.
At the high end of the scale, you’re also off. Single-machine I/O capabilities might be enormous by your standards, but not by mine or by those of anyone who has worked with modern storage systems. 700MB/s for an ultra-pricey FusionIO card? Try 40GB/s. I’ve personally developed code to do that, and it wasn’t on a single system even at a million-dollar price point. It was using exactly the kind of horizontal scaling techniques that you seem to think are unnecessary.
Fourth point: saying that RDBMSes are scalable, because you can (a) throw a ton of money at Oracle/Sybase or (b) do most of the work sharding and replicating something like MySQL, is a bit disingenuous. By the same reasoning, SMP is also scalable because you can pay SGI/3Leaf a ton of money or write your own SDSM layer. It doesn’t really work that way. The better NoSQL solutions – certainly not all of them – are inherently scalable in the sense that they scale in human terms as well as machine terms. In other words, they allow you to add capacity without spending tons of money *or* chewing up months of developer/administrator time. Administering a hundred servers is very nearly as easy as administering five. That’s a definite advantage over any RDBMS clustering I’ve ever seen.
Fifth and last point: defining scalability in terms of “highest realistic level of usage” and “maintaining acceptable service level” is also a big pile of weasel words. Just because most users won’t need the scale of a Facebook of a LinkedIn doesn’t mean they shouldn’t choose solutions that scale well. Such solutions are also likely to be more cost-effective at smaller scale, and some small percentage of users will eventually scale up to the point where the RDBMS cost/benefit curve levels off or falls. “Plan for success” is a perfectly valid business principle. More importantly, one of the whole points of the NoSQL “movement” is that people should think hard about what “acceptable service level” means to them in terms of performance, in terms of CAP, and so on. A solution that far exceeds the necessary service level in some terms (e.g. consistency) but exacts a cost for it is not an ideal solution.
You’re absolutely right that many NoSQL advocates have made immoderate and even ridiculous statements. Of those you link to, I’ve taken a few to task myself. Every time a new technology catches on, no matter how legitimately, it will attract a few loud know-nothings. I invite you to take them on and correct some of their misstatements, but becoming their mirror image only tends to validate their extreme opposition to RDBMS traditionalism. Both types of systems, plus more that we haven’t even discussed, have their advantages and drawbacks.
NOTE at 4:49pm: it appears that Dennis is being very heavy-handed about moderating comments. Considering how controversial the topic is, and all of the links I’ve seen, I don’t think I’m going out on a limb too much by saying that there have probably been more than three comments since 9am this morning. That’s all that have passed moderation, though, including two that were posted after mine (so he’s clearly not just working through a long queue). I’m not sure whether that’s better or worse than lobbing bombs from a blog that doesn’t even pretend to allow comments, but it’s right down there on the low end of the scale either way.