Caching is a neat trick. It’s a really really neat trick, using one level of the storage hierarchy to hide or work around slowness in the next, and we wouldn’t be able to do much of what we do without caches. They’re still a trick, though, because the illusion that they create always fades eventually. Inconsistencies start to creep in, or the cost of maintaining consistency starts to outweigh the advantage of having caches in the first place. Even if you manage to avoid both of those problems, request rates or problem sizes will practically always increase faster than cache sizes until those caches are too small and you’d better have another solution ready. HPC folks have known about this for a long time. They don’t think much of CPU caches, because the kind of stuff they do blows through those in no time. Increasingly, their attitude toward in-memory caching of disk data is the same, because they tend to blow through those caches too. After a few rounds of this, it’s natural to stop thinking of caches as performance magic and start thinking of them as just another tool in the box.

Some folks seem to have trouble learning that lesson. Even if they’ve learned it about computational tasks, they still cling to a belief that caches can last forever when it comes to I/O. Most recently, I’ve seen this in the form of people claiming that modern applications don’t need fast I/O because they can just keep everything in (distributed) memory. The H-Store papers behind some of this advocacy make some good points regarding OLTP specifically, but OLTP is only one kind of computing. While OLTP workloads are still extremely common and important, they account for an ever-decreasing share of storage capacity and bandwidth. Let’s run some numbers to see how inapplicable those conclusions are to most people. A modern server such as the IBM x3650 M2 can accommodate 128GB in 2U. That’s barely 1TB per rack after you account for the fact that some of the space has to be used for a top-of-rack switch to provide bandwidth to that data, and that you have to replicate the in-memory data for some semblance of resistance to faults (though many would say even that is far from sufficient for serious work). The SiCortex systems were built for high density, and even they only got to about 8TB in something less than 3x the physical space. Those are piddling numbers for storage, when installations with petabytes are no longer uncommon. It’s also a very costly way to get that much capacity, paying for RAM and the processors that go with it and the power to run both. It better be worth it. Is it? Only if some small and easily identified part of your data is red-hot, and the rest is all ice-cold, and your operations on that data distribute nice and evenly so you don’t have communication hot spots that would make memory vs. disk irrelevant. That’s a pretty limited set of conditions.

Still not convinced? OK, look at it this way. A lot of people have workloads more like Facebook than TPC-C, and it just so happens that there’s some information out there about how well caching has worked for Facebook. According to James Hamilton, as of last April Facebook was serving 475K images per second out of 6.5B total. Those numbers are certainly higher today – I’ve heard 600K and 15B – but they’re the basis for some other interesting numbers so let’s stick with them. Using 1TB of memory cache across 40 servers they get a 92% hit rate. That’s still multiple gigabytes per second that have to be served from disk – just for pictures, a year ago. For everything, today, they surely need dozens of servers to achieve the necessary bandwidth, and dozens of disks per server to achieve the necessary capacity. Facebook is far from typical, but also far from the “in-memory everything” crowd’s model and moving further away every day. Where Facebook is leading, many others will follow.

Sooner or later, your caches (including memory used as cache rather than as a functionally different operational resource) won’t be big enough to provide adequate performance if the next level in your storage hierarchy is too slow. Caches can still be useful as a way to optimize use of that next level, such as by avoiding redundant requests or turning random small-block I/O into large-block sequential, but they can’t hide its weaknesses. Parallelism and horizontal distribution are the real long-term keys to scalability, for I/O as well as for computation. If you really want scalability, you have to put parallelism at the center of your design with caches in a supporting role.