Over the last few days, I’ve tried to get a couple of parallel filesystems working in various cloud environments. Why? Because I think a parallel filesystem – as we know such beasts today – is an essential building block for the cloud filesystem I’ll be working on. Creating a true cloud filesystem is not a matter of just slapping a new name on an existing parallel not-quite-filesystem. It’s not about minor tweaks to an existing parallel filesystem so that its cluster-developed protocols (barely) work in a more distributed environment. There’s a lot more to it than that, which is precisely why I want to get the parallel-filesystem part settled and move on to working on the other bits ASAP.

Anyway, I rejected Lustre right off the bat because the server side needs kernel patches and this ran up against something I hadn’t realized about cloud services: they make it extremely difficult to run your own kernel. Some simply don’t allow it at all. Others just make the process really cumbersome, and/or don’t provide enough source for you to generate a kernel compatible with the ones they provide – meaning e.g. that you’ll get even worse I/O performance because you have to run even more stuff in fully-virtualized mode. A few don’t even provide enough of the source so you can build your own kernel modules, which has given me a new appreciation for the Affero modifications of the GPL.

I started with Ceph, because it seems in many ways like the coolest of the active projects out there. First I tried to build it on Rackspace. I was in fact able to build it, and install it, on both. Then I spent a lot of time debugging startup failures, which turned out to be caused by the default firewall rules that Rackspace sets up on your instances. I wasn’t terribly pleased with Rackspace for not providing much warning or troubleshooting advice for what I imagine is a pretty common pitfall of using their cloud, or with Ceph for debugging facilities that did very little to shed any light on the matter, but I did get it to work. Unfortunately, performance was extremely poor and it would only stay up a little while before the client would hang. I suspect that it has to do with a couple of Ceph kernel patches that Rackspace surely doesn’t have, so I’m not blaming anyone, but it was a bit of a disappointment nonetheless. For the record, I tried the same on AWS with the same result, except that I didn’t run into the same firewall silliness (this is exactly the kind of area in which AWS seems to have thought things through a little more thoroughly than their competitors).

So, what to try next? I probably should have gone with PVFS, which I still think is the best of its generation. I ended up going with GlusterFS, though, because I knew from past experience that it had the simplest build process of anything I’d found. This time I started on AWS, and got it built pretty quickly . . . or so I thought. It turns out that some FUSE pieces hadn’t really built or installed, silently ignoring the absence of the fuse-devel prerequisite, but that was easy enough to fix. Then I had a heck of a time trying to get a mount to work properly. The documentation mentions two or three different formulae for the client mount command, but they’re all just a little bit wrong. Even when I finally hit on the right general invocation, it kept trying to mount on /mnt even though I’d told it to use /mnt/gfs-data. Eventually I managed to fix that too, and run some reasonable-scale tests without any stability problems.

The results are not bad. With only two servers, and four I/O threads on one client, I was able to get up to 70MB/s write and 90MB/s read. These numbers aren’t directly comparable to my previous numbers since I was using large (64-bit) instances this time, but they’re really not too bad for an untuned configuration using virtualized Ethernet with a 1500-byte MTU. I think I’ll spin up a couple more instances to check scaling, and maybe try the same experiment on Rackspace to get some more data points, but it’s a good start. I’ll probably be posting some more results after my official job start on Monday.