Apparently Artur Bergman did a very popular talk about SSDs recently. It’s all over my Twitter feed, and led to a pretty interesting discussion at High Scalability. I’m going to expand a little on what I said there.
I was posting to comp.arch.storage when Artur was still a little wokling, so I’ve had ample opportunity to see how a new technology gets from “exotic” to mainstream. Along the way there will always be some people who promote it as a panacea and some who condemn it as useless. Neither position requires much thought, and progress always comes from those who actually think about how to use the Hot New Thing to complement other approaches instead of expecting one to supplant the other completely. So it is with SSDs, which are a great addition to the data-storage arsenal but cannot reasonably be used as a direct substitute either for RAM at one end of the spectrum or for spinning disks at the other. Instead of putting all data on SSDs, we should be thinking about how to put the right data on them. As it turns out, there are several levels at which this can be done.
- For many years, operating systems have implemented all sorts of ways to do prefetching to get data into RAM when it’s likely to be accessed soon, and bypass mechanisms to keep data out of RAM when it’s not (e.g. for sequential I/O). Processor designers have been doing similar things going from RAM to cache, and HSM folks have been doing similar things going from tape to disk. These basic approaches are also applicable when the fast tier is flash and the slow tier is spinning rust.
- At the next level up, filesystems can evolve to take better advantage of flash. For example, consider a filesystem designed to keep not just journals but actual metadata on flash, with the actual data on disks. In addition to the performance benefits, this would allow the two resources to be scaled independently of one another. Databases and other software at a similar level can make similar improvements.
- Above that level, applications themselves can make useful distinctions between warm and cool data, keeping the former on flash and relegating the latter to disk It even seems that the kind of data being served up by Wikia is particularly well suited to this, if only they decided to think and write code instead of throwing investor money at their I/O problems.
Basically what it all comes down to is that you might not need all those IOPS for all of your data. Don’t give me that “if you don’t use your data” false-dichotomy sound bite either. Access frequency falls into many buckets, not just two, and a simplistic used/not-used distinction is fit only for a one-bit brain. If you need a lot of machines for their CPU/memory/network performance anyway, and thus don’t need half a million IOPS per machine, then spending more money to get them is just a wasteful ego trip. By putting just a little thought into using flash and disk to complement one another, just about anyone should be able to meet their IOPS goals for lower cost and use the money saved on real operational improvements.
“I was posting to comp.arch.storage when Artur was still a little wokling.”
You may be right but that just made the whole article invalid.
Then so is Artur’s profanity- and insult-filled presentation. Dismissing a conclusion because of an inessential part of the argument is fallacious, especially when that part is an obvious and deliberate response in kind. Do you have anything to say on the *substance* of either his argument or mine?
Spot on, Jeff.
DragonflyBSD has Swapcache:
“swapcache is a system capability which allows a solid state disk (SSD) in
a swap space configuration to be used to cache clean filesystem data and
meta-data in addition to its normal function of backing anonymous memory.”
Hey,
I had 2.5 minutes to make a point, not possible to fit everything in that, my surge and velocity europe talks both contain way more data. My intention was more to force people to stop thinking of SSDs as super expensive and unreliable. You accuse me of soundbites, not much else I can deliver in that time. I still stand that if you need more iops than disks can deliver, SSDs rapidly become cost effective.
Your analysis of Wikia’s data is mostly correct, though predicting which wiki is going to be popular is not really possible. Old versions of articles are archived away for example. However all decisions to use SSDs (and they developed over a 3 year process) were done by careful cost/benefit analysis. I dare to say when I left Wikia, we must have been one of the leanest top 40 websites in existence.
Nearly all our traffic was served from our front line caches, where we used SSDs as cheap RAM. Our databases are all IO bound, so we drastically reduced the number of database servers, and removed a lot of caching levels.
So, it might be fun to insult me for wasting our investors money. It really isn’t true.
Another thing is that by removing a lot of the code that tries to make sure we use sequential data, the code becomes much easier and less complex.
Artur