One of the most annoying things about many NoSQL projects is the relentless promotion of certain projects. The competition for users, contributors, potential investors and customers, speaking and consulting engagements, and general attention is fierce. People really really want projects that they’re involved with to succeed, and I can respect that they’re willing to fight for that . . . but sometimes they fight dirty and I don’t respect that so much. Sadly, one example that recently appeared is Basho’s comparison of their own product Riak to Cassandra. Now, I know the Basho guys aren’t stupid. Riak does basically work, and stupid people wouldn’t have gotten it that far. Some of the explanations on their site of things like consistent hashing and vector clocks are quite good. Even the article I’m about to address demonstrates that they actually do know a lot about this stuff . . . so ignorance is not a likely excuse for its misrepresentations. They must have known their comparisons were inaccurate, I know some of the inaccuracies have been addressed by others, they’ve had ample time to make corrections, and yet the misrepresentations remain. Let me address a few so people can see what I mean.
When you add a new node [in Riak], it immediately begins taking an equal share of the existing data from the other machines in the cluster, as well as an equal share of all new requests and data. This works because the available data space for your cluster is divided into partitions (64 by default).
When you add a machine to a Cassandra cluster, by default Cassandra will take on half the key range of whichever node has the largest amount of data stored. Alternatively, you can override this by specifying an InitialToken setting, providing more control over which range is claimed. In this case data is moved from the two nodes on the ring adjacent to the new node. As a result, if all nodes in an N-node cluster were overloaded, you would need to add N/2 new nodes. Cassandra also provides a tool (‘nodetool loadbalance’) that can be run in a rolling manner on each node to rebalance the cluster.
Quick question: does dividing keys into 64 partitions give as much load-balancing flexibility as dividing at any point in the 128-bit space provided by MD5 (which is what you’d have with Cassandra’s RandomPartitioner)? 64 is a particularly problematic number, not only because of what happens if you have more than 64 nodes but because even with fewer nodes the granularity is just too coarse. If a partitition is overloaded, too bad. There’s even a comment in the admin documentation that comes with Riak saying that you should set this to several times the number of nodes in your cluster . . . as though that’s static. What if you set it appropriately for a four-node cluster and then grew to forty? You won’t find them discussing those issues in the comparison but, oh, they sure are quick to speculate about needing to add N/2 nodes with Cassandra. At least they mention “nodeprobe loadbalance” but they get the command wrong and don’t seem to appreciate what it really does. Go read CASSANDRA-192 for yourself, and you’ll see that it can actually balance load much better than Riak with its partitions ever could.
Cassandra has achieved considerably faster write performance than Riak.
That said, the Riak engineering team has spent a lot of time optimizing Riak performance and benchmarking the system to ensure that it stays fast under heavy load, over long periods of time, even in the 99th-percentile case. We like speed, just not at the expense of reliability and We like speed, just not at the expense of reliability and scalability.
In other words, “We’re forced to admit that Cassandra is faster but we’ll spread FUD to distract from that.” I’m pretty sure the Cassandra folks are also unwilling to compromise on reliability and scalability for the sake of speed. The authors of the comparison have in no way shown that Riak even has an advantage in reliability or scalability, but their conclusion is worded to imply – without actually having the guts to state outright – that they made a responsible tradeoff and Cassandra made an irresponsible one. That’s just disgusting.
Riak tags each object with a vector clock, which can be used to detect when two processes try to update the same data at the same time, or to ensure that the correct data is stored after a network split.
In contrast, Cassandra tags data with a timestamp, and compares timestamps to determine which data is newer. If a client’s timestamp is out of sync with the rest of the cluster, or if a client waits too long between reading and writing data, then it is possible to lose the data that was written in between.
The difference between timestamps and vector clocks is a legitimate one, and in this case I think the Riak folks are right to bring it up. I personally would prefer a vector-clock-based approach. The “waits too long . . . in between” is kind of FUD-ish, though. This is a potential problem in any eventually consistent system, including those that use vector clocks (which resolve some but not all conflicts). Riak does provide the building blocks for a good solution, in the form of their X-Riak-Vclock extension and user-driven conflict resolution, but they don’t even allude to that. I guess “Cassandra might screw up your data” was easier than discussing a point that might actually have favored them.
Riak buckets are created on the fly when they are first accessed. This allows an application to evolve its data model easily.
In contrast, the Cassandra Keyspaces and Column Families (akin to Databases and Tables) are defined in an XML file. Changing the data-model at this level requires a rolling reboot of the entire cluster.
Another completely fair point. Supercolumns have their uses, but they’re really no substitute for dynamic bucket/ColumnFamily creation.
In contrast, Cassandra has only one setting for the number of replicas
Absolutely, positively untrue. Cassandra in fact allows you to specify the number or replicas on a per request basis, to trade off performance vs. protection from failures. This is such a commonly discussed and central feature that I find it impossible to believe the authors weren’t aware of it. Interpreting the abundant information on this topic as “only one setting” is, again, reprehensible.
As I hope readers can see, I’m not just rejecting every criticism of Cassandra. I like Cassandra, I like the developers, but I’m no fanboi even on projects I’m more directly involved in. There’s always room for improvement. What I do object to, though, is criticism given without due diligence. I might even be wrong about Basho’s load distribution, for example, but at least I tried to find the facts. I expect at least that much diligence and objectivity from people who are posting their findings on a company-sponsored website as part of their day jobs, and frankly I don’t think those traits are very evident in Basho’s comparison.