Back in the day, a lot of people in computing were afflicted with network phobia. They didn’t understand networking, so they avoided it. Even when they had to provide services over a network, they left their systems as centralized as possible. They hid behind RPC abstractions – a flawed approach which Eric Brewer demolished in the often-forgotten second half of his CAP Theorem presentation. They hid behind transaction monitors and load balancers, and contorted their internal architecture to work with those. If they cared about availability, they implemented pairwise failover. Finally, twenty years or so later, we have a generation of programmers who grew up with networking and who are finally comfortable working on truly distributed systems. That’s wonderful. Unfortunately, eliminating one barrier often reveals another, and in this case the second barrier is storage phobia.

A lot of people fear or hate working with storage. They hate the protocols, especially Fibre Channel, because they’re loaded down with excess complexity. Some of the complexity is there because it needs to be there, because Fibre Channel does a lot more than plain old Ethernet. A lot more is there because of standards-body silliness. My favorite example is having to identify devices using Inquiry strings which might be presented in several different formats crossed by several different encodings, all so that every T11 member could be satisfied. (Handling all these combinations instead of just one or two, BTW, is the kind of thing that distinguishes software written by diligent professionals from Amateur Hour.) InfiniBand and iSCSI are just different standards groups making the same mistakes. Even more than standards and protocols, though, many people fear or hate working with storage implementations. It wasn’t so long ago that many significant kinds of reconfiguration on an EMC Symmetrix required typing hex into a console (anatmain) followed by a full restart. Yes, it was even more primitive than Windows. Every array, every switch, every HBA has its own “quirks” with which customers inevitably develop an unwelcome intimacy. Believe me, people who’ve worked in these trenches for many years understand all of that much better than anyone looking in from outside ever will.

Nonetheless, people from outside often seem to think they can do better. Do they try to do better by preserving some of the existing technology’s strengths and addressing its weaknesses? Sadly, no. Instead, they fixate on disk-failure-rate statistics from two years ago, presenting them as “evidence” of their already-held conclusion that conventional storage is something to be avoided. Is the total MTDL (Mean Time to Data Loss) of an internally redundant shared array the same as that for individual disks? Are disks housed in servers – and thus subject to the servers’ own failure rate as well as their own – going to fare better? Does the lack of moving parts make RAM infallible? No, no, and NO. But, but . . . those disk-failure rate numbers are scary! Surely any alternative must be preferable? Argh. Letting fear of conventional storage drive the creation of “solutions” that are just as complex without the benefits of being as general, as well tested, or as well documented is a mistake. (Open but undocumented and untested can be worse than closed, BTW, if the cost of reverse-engineering and fixing the implementation is greater than the cost of licensing would have been.) Such attempts generally lose even when considered alone, and even more so when the effects of fragmentation and incompatibility are considered.

Yes, even the best storage systems can fail. Yes, software must ultimately be responsible for recovering from failure. Yes, many programs make the incorrect assumption that high-end storage fails too rarely to worry about, and thus abdicate their own responsibility. That doesn’t mean there’s no value in storage which handles some failures transparently and others without losing access to even one single replica. It certainly doesn’t mean that some new piece of software based on such an assumption will make anything better. Good software is based on balance, not extremism. That means not making the software more complex by coding to the least common denominator. It means taking advantage of external facilities that are already common and cost-effective, and letting the software focus on the problems that it alone can solve.