As I’m sure you’ve all noticed by now, I’ve become a bit sensitive about people bashing GlusterFS performance. It’s really hard to make even common workloads run well when everything has to go over a network. It’s impossible to make all workloads run well in that environment, and when people blame me for the speed of light I get a bit grouchy. There are a couple of alternatives that I’ve gotten particularly tired of hearing about, not because I fear losing in a fair fight but because I feel that their reality doesn’t match their hype. Either they’re doing things that I think a storage system shouldn’t do, or they don’t actually perform all that well, or both. When I found out that I could get my hands on some systems with distributed block storage based on one of these alternatives, it didn’t take me long to give it a try.

The first thing I did was check out the basic performance of the systems, without even touching the new distributed block storage. I was rather pleased to see that my standard torture test (random synchronous 4KB writes) would ramp very smoothly and consistently up to 25K IOPS. That’s more than respectable. That’s darn good – better IOPS/$ than I saw from any of the alternatives I mentioned last time. So I spun up some of the distributed stuff and ran my tests with high hopes.

synchronous IOPS

Ouch. That’s not totally awful, but it’s not particularly good and it’s not particularly consistent. Certainly not something I’d position as high-performance storage. At the higher thread counts it gets into a range that I wouldn’t be too terribly ashamed of for a distributed filesystem, but remember that this is block storage. There’s a local filesystem at each end, but the communication over the wire is all about blocks. It’s also directly integrated into the virtualization code, which should minimize context switches and copies. Thinking that the infrastructure just might not be handling the hard cases well, I tried throwing an easier test at it – sequential buffered 64KB writes.

buffered IOPS

WTF? That’s even worse that the synchronous result! You can’t see it at this scale, but some of those lower numbers are single digit IOPS. I did the test three times, because I couldn’t believe my eyes, then went back and did the same for the synchronous test. I’m not sure if the consistent parts (such as the nose-dive from 16 to 18 threads all three times) or the inconsistent parts bother me more. That’s beyond disappointing, it’s beyond bad, it’s into shameful territory for everyone involved. Remember 25K IOPS for this hardware using local disks? Now even the one decent sample can’t reach a tenth of that, and that one sample stands out quite a bit from all the rest. Who would pay one penny more for so much less?

Yes, I feel better now. The next time someone mentions this particular alternative and says we should be more like them, I’ll show them how the lab darling fared in the real world. That’s a load off my mind.