Lies, Damn Lies, and Parallels

This apparently happened a while ago, but it recently came to my attention via LWN that James Bottomley has made the claim that "Gluster sucks" (not a paraphrase, those seem to be his exact words). Well, I couldn't just let that go by, could I? Why would he say such a thing? The only visible thing is a recent presentation at the Parallels Summit, which is - to put it bluntly - just full of lies. Let's take a look at just how bad it is.

Our starting point is a performance graph on slide 3, purportedly showing how Parallels Cloud Storage is way ahead of everyone else in terms of aggregate Gbps . . . but wait. How many clients are we talking about? How many servers? He doesn't say. What kind of hardware? He doesn't say. What kind of configuration? He doesn't say. What kind of workload? He doesn't say. What does it even mean to put up numbers for both distributed storage systems (running on what kind of network?) and "DAS - 15,000 RPM"? Is he comparing apples to oranges, or apples to whole crates full of oranges? That graph is the absolute worst kind of fact-free marketing. It's utterly useless for drawing any engineering conclusions about anything. Onward to slide 5. What does this mean?

File based Storage

...

suffers from metadata issues on the server

"The" server eh? Where have I heared that before? Oh yeah, right here. He's making the same mistake that James Hughes did, of thinking that because he can't think of a better way to handle metadata then nobody can. To quote Schopenhauer, "Everyone takes the limits of his own vision for the limits of the world." Onward to slide 7.

Using a fixed size object incurs no metadata overhead whatsoever

Here he has inadvertently identified a deficiency not in real cloud filesystems but in the Parallels alternative. Fixed-size objects are just not a reasonable limitation in many use cases. Any system designed around such a limitation is hopelessly weak compared to one that handles the more general case. As I explained the last time Parallels was slinging this kind of FUD, the same can be said about systems that don't allow real sharing of data - including both object and block stores. People wouldn't still be making billions of dollars per year selling NAS if users didn't want those more general semantics. Onward to slide 8.

Fuse is the Linux Userspace Filesystem

Main problem is it’s incredibly SLOW

So why has FUSE historically been slow? Because the kernel hackers whose sign-off was needed to make it less slow were extremely resistant to any change that would have that effect. People like James Bottomley himself. When you're wrong for so long it's a disingenous to take so much credit for finally ceasing your own resistance to change. Onward to slide 9.

Eventual Consistency is the usual norm

...

Gluster (does have a much slower strong consistency quorum enforcement mode)

The first part is highly misleading. Eventual consistency is not the norm in GlusterFS. In normal operation, updates are fully synchronous and there will be no inconsistency beyond that which exists in any distributed system while an update is still in progress. The only time there's any observable inconsistency is in the presence of failures, and not just any failure but the kind or number that can lead to split-brain. Also, quorum enforcement does not make anything slower. It has zero performance impact; that's just more FUD.

Basically, what Bottomley has provided is just one big hatchet job based on misleading or outright false statements. The fact is that GlusterFS can do many things that Parallels Cloud Storage can't. It provides full filesystem semantics, truly shared data, geo-replication (still a hand-wave for PCS), Hadoop and Swift integration, and many other features. Yes, it might be true that PCS can outperform GlusterFS for the only use case that PCS can handle, on an unspecified configuration with an unspecified workload. Or maybe not, since those details are missing and the software itself isn't open so that others can make their own comparison.

In my experience, people only make such totally bullshit comparisons when legitimate ones don't paint the picture they want. It's not science. It's not engineering. It's not even marketing done right. It's just lying.

Comments for this blog entry