Looks like this is the week to re-hash old arguments. Nati Shalom: Memory is the New Disk for the Enterprise. He even tries to run some numbers, but they didn’t look very convincing to me so I decided to run my own. Here are some data points.

  • 8GB of DDR2-800 ECC RAM costs $249 or a bit over $30/GB.
  • A 147GB 15K RPM SAS drive costs $179 or around $1.20/GB.
  • A fairly generic 1U Xeon-based server with 2GB costs $1435.

Nati was using Cisco UCS systems. Somehow I doubt that either the systems themselves or the memory that goes in them are cheaper than generic stuff at NewEgg, so the above figures should be considered lower bounds for those prices. He also claims that a 4TB data set fits into 4 UCS chassis, which doesn’t seem right considering that the data sheet for the UCS C250 M1 “extended memory” server only claims 384GB. Now, 384GB in 1U is a beautiful thing, but it’s not 1TB. It also costs a lot more than Nati claims. In reality, you’ll need a dozen systems just for capacity. Since most of the data you’re accessing is going to be remote, you’re probably going to need more to get enough bandwidth to all that data. Yes, even with 10GbE. Even with QDR IB, which is 3x as fast for the same price. I’ll be generous, though, and stick with a dozen. In fact, I’m going to ignore NICs and switches and cables altogether. Let’s look at what you really need to satisfy Nati’s 4TB use case.

  • RAM-based: $1435 * 12 servers + $249 * 500 * 8GB RAM = $142K
  • “Lean” disk-based: $1435 * 12 servers + $179 * 28 * 147GB disk = $22K
  • “Beefy” disk-based: $1435 * 12 servers + $249 * extra 8GB RAM + 56 * 147GB disk = $30K

The difference between the “lean” and “beefy” configurations is based on two observations. First is that disk data can be cached in memory. Gasp! It might seem obvious to most of us, but somehow the RAM-cloud advocates always seem to “forget” that in their comparisons. In fact, a RAM-based system that can spill to disk isn’t all that different than a disk-based system that can cache in RAM, and most of the differences are negative. The second observation is that more spindles offer better transfer rates and (more importantly) more ops/second, and are often deployed even when capacity goals have already been met. Still not fast enough for your workload? Spend a few grand on another rank of disks. Repeat as necessary, for trivial cost compared to the all-RAM approach.

The RAM-based system would have to perform a lot better than the disk-based one to justify the 5x price differential. That would require that practically all of the data be hot all the time, which is generally unlikely and particularly so in Nati’s two examples. Both the online retailer and the airline reservations are likely to have access patterns that are characterized by a small time-based window onto a much larger data set. It’s also not clear that these systems qualify as large any more, and the comparison only looks worse for RAM as systems get bigger. You don’t have to be Facebook to have millions of users accessing ten times more data than Nati’s use cases. There are at least a couple of dozen applications within Facebook that each have million daily users, matched by many times more at standalone websites, in turn matched by many times more within corporations. Applications scaling from dozens of terabytes up to petabytes are more common than many people seem to think. There are applications that fit into Nati’s model – I saw a few in my HPC days – but in general if your data set is that large then most of it’s going to be cold and most of it is going to be remote. When you’re already incurring the cost of a network operation, the system-level performance difference between disk cached/prefetched appropriately in RAM and all-RAM is negligible compared to the cost difference (for the vast majority of workloads). Nine out of ten people who think they have a truly RAM-cloud-appropriate access pattern should be spending their money not on extra RAM but on smarter programmers.

In the end, the numbers just don’t add up. Maybe when systems have at least a terabyte each and a network to match, but even then I remain skeptical. Of course making disk do RAM’s job can be slow. And making RAM do disk’s job can be expensive. People shouldn’t harp on one without recognizing the other. If you really want to get serious about optimizing data-access performance, here’s another bonus observation: a two-level operational/archival distinction is too confining and moving the line from disk/tape to RAM/disk is pointless. What people should really be thinking about is local RAM to remote RAM to SSD to fast disk to slow disk to cloud, or similar. Use all of the tools in the box, instead of one for everything.