Archive for January, 2012

Hacking Filesystems Is Easy

If you want to hack on distributed filesystems, there is no easier way to get started than by writing a GlusterFS translator. To prove this point, I’ve recently implemented two new translators which are very simple but provide significant benefits in certain situations. These have nothing to do with HekaFS, really, except that HekaFS takes advantage of this same simplicity to do what it does. The first translator does negative lookup caching.

This is a very simple translator to cache “negative lookups” for workloads in which the same file is looked up many times in places where it doesn’t exist. In particular, web script files with many includes/requires and long paths can generate hundreds of such lookups per front-end request. If we don’t cache the negative results, this can mean hundreds of back-end network round trips per front-end request. So we cache. Very simple tests for this kind of workload on two machines connected via GigE show an approximately 3x performance improvement.

The second translator bypasses replication.

This is a proof-of-concept translator for an idea that was proposed at FUDcon 2012 in Blacksburg, VA. The idea is simply that we can forward writes only to local storage, bypassing AFR but setting the xattrs ourselves to indicate that self-heal is needed. This gives us near-local write speeds, and we can mount later without the bypass to force self-heal when it’s convenient. We can do almost the same thing for reads as well.

They weigh in at 224 and 229 lines respectively, with some of that taken up by licenses and white space. Each took less than a day to write. Please bear in mind, though, that these are only prototypes. They exist to teach and to make a point, not – in their current form – to be used in production. Making them suitable for real-world use would at least double their size and triple the time needed for testing. That’s still orders of magnitude better than what you’d have to do to implement similar functionality in other projects that claim to be competitive with GlusterFS, and the result is still much more functional than one of those stripped-down jokes that just have “FS” in the name to mislead users. If you’re a developer and you think you can do distribution or replication or caching or anything else better than GlusterFS, show us. Translators let you implement your ideas quickly, and then do a true “apples to apples” comparison vs. what came before. That could revolutionize distributed storage, but only if people take advantage of the opportunity.


HekaFS 0.7-23 Released

Today Kaleb put together a new release, based on various bits and pieces he and I have done over the last while. The big thing is that the SSL-capable transport/protocol bits are now based on GlusterFS 3.2.5 plus the latest greatest version of the patch that’s still chugging through the GlusterFS 3.3+ queue. There are also several changes to init scripts, man pages, etc. – nothing earth shattering, but just the usual “keeping up with the world” kind of stuff. If you’re using Fedora 16 or Rawhide, the new version should be coming soon to a repository near you. If you’re using something else, you can pick up the tarball or clone the git repository.

P.S. The intention among the GlusterFS/HekaFS developers is still to roll pieces of the HekaFS functionality into GlusterFS one by one until there’s nothing left. The SSL code is going first, the other translator pieces will follow soon-ish, and the management stuff (which is turning out to be tricky) will probably be last. Meanwhile, patches etc. will continue to be applied, with the current F16/3.2.5 code on a fairly slow-moving branch and everything else on a slightly faster-moving trunk.