SAN Stalwarts and Wistful Thinking

I've often said that open-source distributed storage solutions such as GlusterFS and Ceph are on the same side in a war against more centralized proprietary solutions, and that we have to finish that war before we start fighting over the spoils. Most recently I said that on Hacker News, in response to what I saw as a very misleading evaluation of GlusterFS as it relates to OpenStack. In some of the ensuing Twitter discussion, Ian Colle alerted me to an article by Randy Bias entitled Converged Storage, Wishful Thinking & Reality. Ian is a Ceph/Inktank guy, so he's an ally in that first war. Randy presents himself as being on that side too, but when you really look at what he's saying it's pretty clear he's on the other team. To see why, let's look at the skeleton of his argument.

  • "Elastic block storage" is a good replacement for traditional SAN/NAS.

  • "Distributed storage" promises to replace everything but can't.

  • The CAP theorem is real, failures are common, and distributed storage doesn't account for that.

The first two points are hopelessly muddled by his choice of terms. When people in this space hear "elastic block storage" they're likely to think it means Amazon's EBS. However, Amazon's EBS is distributed storage. Try to read the following as though Randy means Amazon EBS.

Elastic Block Storage (EBS) is simply an approach to abstracting away SAN/NAS storage {from page 4}

Elastic block storage is neither magic not special. It’s SAN resource pooling. {from Twitter}

That conflicts with everything else I've heard about Amazon EBS. I even interviewed for that team once, and they sure seemed to be asking a lot of questions that they wouldn't have bothered with if EBS weren't distributed storage. Amazon's own official description of EBS bears this out.

Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component.

Servers, eh? Not arrays. That sounds a lot like distributed storage, and very unlike the "SAN resource pooling" Randy talks about. Clearly he's not talking about Amazon EBS vs. distributed storage because one is a subset of the other. What he's really talking about is SAN-based vs. distributed block storage. In other words, his first point is that SAN hardware repackaged as "elastic block storage" can displace SAN hardware sold as itself. Yeah, when you cut through all of the terminological insanity it ends up sounding rather silly to me too.

Randy's second point is that users need multiple tiers of storage, and a distributed storage system that satisfies the "tier 1" (lowest latency) role would be a poor fit for the others. Well, duh. The same is true of his alternative. The fundamental problem here seems to be an assumption that each storage technology can only be deployed one way. That's kind of true for proprietary storage systems, where configurations are tightly restricted, but open storage systems are far more malleable. You can buy different hardware and configure it different ways to satisfy different needs. If you want low latency you do one thing and if you want low cost you do another, but it's all the same technology and the same operations toolset either way. That's much better than deploying two or more fundamentally different storage-software stacks just because they're tied to different hardware.

The homogeneity assumption is especially apparent in Randy's discussion of moving data between tiers (HSM) as though it's something distributed storage can't do. In fact, there's nothing precluding it at all, and it even seems like an obvious evolution of mechanisms we already have. With features like GlusterFS's upcoming data classification you'll be able to combine disparate types of storage and migrate automatically between them, if you want to and according to policies you specify. Again, this can be done better in a single framework than by mashing together disparate systems and slathering another management layer on top.

Lastly, let's talk about CAP. Randy makes a big deal of the CAP theorem and massive failure domains, leading to this turd of a conclusion:

Distributed storage systems solve the scale-out problem, but they don’t solve the failure domain problem. Instead, they make the failure domain much larger

Where the heck does that idea come from? I mean, seriously, WTF? I think I'm on pretty safe ground when I say that I know a bit about CAP and failure domains. So do many of my colleagues on my own or similar projects. The fact is that distributed storage systems are very CAP-aware. One of the main tenets of CAPology is that no network is immune to partitions, and that includes the networks inside Randy's spastic block storage. Does he seriously believe a traditional SAN or NAS box will keep serving requests when their internal communications fail? Of course not, and the reason is very simple: they're distributed storage too, just wrapped in tin. We all talk to each other at the same conferences. We're all bringing the same awareness and the same algorithms to bear on the same problems. Contrary to Randy's claim, the failure domains are exactly the same size relative to TB or IOPS served. The difference is in the quality of the network implementation and the system software that responds to network events, not in the basic storage model. Open-source distributed storage lets you build essentially the same network and run essentially the same algorithms on it, without paying twice as much for some sheet metal and a nameplate.

In conclusion, then, Randy's argument about storage diversity and tiering is bollocks. His argument about CAP and failure domains is something even more fragrant. People who continue to tout SANs as a necessary component of a complete storage system are only serving SAN vendors' interests - not users'.

Comments for this blog entry