Object Store File Systems

Several years ago, Amazon created something called S3 - Simple Storage Service. The "simple" part was based on the premise that distributed file systems are too complex, inhibiting scalability while providing too little marginal value to users. According to that theory, a system with a simpler API and semantics (e.g. weak consistency) should be preferable. It's an appealing story, which has led to many imitators - most notably OpenStack's Swift.

Personally, I've always viewed the "file systems are too complex" claim with skepticism bordering on contempt, but that's actually neither here nor there. Even if we accept that claim for the sake of argument, there's another claim I've been seeing that still remains beyond the pale - i.e. that somehow implementing a distributed file system on top of an S3-like object store can make the world better. No, it can't. However complicated a distributed file system might be to begin with, it sure doesn't get any less complicated when you stick an alien (usually HTTP-based) API in the middle. Overcoming the impedance mismatch between that API and its associated consistency/durability semantics vs. what a file system requires will always involve extra work and extra potential for failure. Implementing a file system on top of a richer kind of object API such as Ceph's RADOS can make sense, but whatever kind of file system you could implement on top of S3/Swift objects could be implemented better on top of a more compatible abstraction. There are in fact plenty of people who have already been doing that for years, and they've gotten pretty good at it. You're not going to improve on that with a design that people in that community already tried and long since improved upon.

It's logically inconsistent to say that file systems in their native form are too complex, but file systems implemented on top of simple object stores wouldn't be. It's not just incorrect; it's impossible. Anyone making such a claim either doesn't know the truth or doesn't care about it. Either way, would you trust your data to someone who's trying to sell you the data-storage equivalent of homeopathy or perpetual-motion machines?

Comments for this blog entry