Over the last few days, I’ve been using some of my spare time to benchmark several of the key/value stores that are out there. I don’t think all that much of limited-to-memory stores, because I think a store designed to be persistent can do everything a memory-based store can do by the simple expedient of running it on top of a RAM disk or filesystem. In a world where horizontal scaling is the name of the game, the argument that a memory-based store can avoid syscall overhead or some such is much less compelling than the argument that the human resource cost of deploying a whole separate infrastructure outweighs the material resource cost of deploying a few more servers. Whether you agree or disagree with that reasoning, what I set out to compare was a set of persistent stores. They varied in lots of other ways, though. Here are the competitors and how they distinguish themselves:

  • Tokyo Tyrant is a single-node data store which would require an extra layer to make it into the kind of multi-node data store I think most people really want/need (more about that later). It also has a very simple data model and practically no security model, but everyone says it’s fast so it’s a good baseline.
  • Chunkd is the lower layer of Project Hail, and is mostly equivalent to Tyrant except that it has a notion of multiple users and channel security (which was turned off for benchmarks).
  • Voldemort differs from Tyrant along the other axis – it still has a simple data model and no security features to speak of, though it does span nodes and supports multiple stores served through a single port.
  • Cassandra and Riak add richer (but different) data models to Voldemort’s functionality, but testing was for a very simple “one key maps to one value” model.
  • Tabled is the upper layer of Project Hail, and is more of an S3-equivalent large object store with full multi-user access control etc. I figured it would perform worse than the others (almost but not quite true) but I was curious to see how much of a penalty was associated with the extra functionality.

To test performance, I wrote a Python script which could speak to the interfaces for any of these stores. I had to create such an interface for chunkd using the ctypes module, but that was no big deal. I then ran tests as follows:

  • Use two virtual machines on my desktop. I’ll re-compare with EC2 instances, including multiple servers where appropriate, soon.
  • Run ten client threads on one machine, vs. all the server parts on the other, for a minute.
  • Test write, warm read, and cold read (i.e. daemon killed, all caches dropped, daemon restarted).
  • Test 100-byte and 1000-byte values. I probably should have tested (in future will test) 10K or larger, but for now I was interested in two small sizes.

Here are the performance highlights. I’ll post full numbers after I re-run the tests on EC2 as I mentioned; all I’ll say for now is that the per-second rates for all tests ranged from low hundreds to low thousands.

  • Most of the candidates were two to four times faster for warm reads than for writes, except for Tyrant which was only about 10% faster.
  • Half of the candidates (Chunkd, Cassandra, Riak) were two to three times as fast for warm reads as for cold. The other half (Tyrant, Voldemort, Tabled) were only 0-20% faster.
  • Tyrant was the clear winner, more than 20% faster than its closest competitor even for the warm-read cases and showing very little drop-off for either writes or cold reads – making it up to four times as fast in some tests.
  • Chunkd almost kept up with Tyrant in the warm-read case, but fell far behind otherwise. It still took second place in every test, though.
  • After that, things slowed down pretty consistently. Cassandra was only 30-85% as fast as chunkd, Voldemort only 30-80% as fast as Cassandra (except for the larger cold-read test where it actually pulled ahead by 25%), and tabled only 50-90% as fast as Voldemort. Trailing the pack by a considerable margin was Riak, which was only 15-45% as fast as tabled.
  • By the time all was said and done, Tyrant was 8-24 times as fast as Riak.

The fairly obvious conclusion is that the performance differences between alternatives in this space are definitely big enough to be worth thinking about. A top-to-bottom ratio of 2:1 or 3:1 might be made up with tuning or by modifying applications to take advantage of each store’s special features, but at 24:1 that starts to seem unlikely. Like Leonard Lin, I find Tyrant’s performance lead compelling enough to think that a consistent-hashing layer on top of Tyrant might kick ass. Like Leonard, I’m not convinced that LightCloud really delivers on that promise – though it’s interesting enough that I’ll probably include it in my next round of comparisons.

As I said, I still need to re-test in a real cloud environment to validate these results, plus I’ll test more sizes. I’ll probably also drop tabled and Riak in favor of LightCloud and maybe Keyspace. I’ll definitely publish the benchmark program and any ancillary pieces (such as my Python interface for chunkd), and go into more detail about exactly how everything was set up. I’ve found way too little information in this area that’s even empirical, let alone quantitative, so maybe I can get something started here to fill the void.