Archive for May, 2011

Network Encryption

Over the course of many (too many) presentations about CloudFS, I’ve started to think about CloudFS 1.0 in terms of four features: namespace isolation, ID isolation, storage encryption and network encryption. Yes, there are other bits too, but those are either there to support these four (most of the management stuff is this way) or they require little CloudFS-specific development effort (e.g. quota). Namespace isolation and storage encryption have been there for a while, though I expect Edward to make the storage encryption even better. Kaleb has been working on the ID isolation, and recently checked in a bunch of code that’s feature-complete or very close to it. Yesterday, having just worked on SSL support for the management stuff, I decided to work on SSL for the actual data transport as well. I expected it to take at least a week to get anything at all working, but I’m pleased to report that after just one day it seems to be working quite well. I’m sure I’ll find some bugs as I explore some of the more difficult cases (e.g. multiple connections to a single glusterfsd process), plus there’s a whole lot of integration and patch-pushing still to be done, but the code basically seems ready to use. Performance according to my very preliminary tests is approximately 2x what you’d get using an ssh tunnel to get the same effect, though still only 1/3 of what you’d get with the regular socket transport.

What this means, aside from the fact that this one feature itself is nearing code-complete, is that the whole of CloudFS 1.0 is nearing code-complete. Coding is likely to occupy less time than testing, documenting, packaging, and so on. That will make me grumpy. Hopefully it won’t take all that long, though, and then work can truly begin on CloudFS 2.0 – for which the “dynamo” translator is likely to be a major focus. For a teaser you can look near the end of my Summit presentation; I’ll probably post more details here as currently-vague plans firm up a bit. I also have some other ideas up my sleeve – mostly performance tweaks – that I might write about as well. For now, though, I’m just pleased as punch about how much CloudFS development has accelerated lately.


New Management Interface

One of the sad things about working on filesystems is that it’s hard to demonstrate what makes your file system so special. So you wrote some files and then read them back? Big deal. Everyone does that. Performance graphs might impress one’s fellows, and demonstrating survival of a disk/server failure might even get a raised eyebrow from a non-specialist, but that’s about it. This matters not only for impressing others, but for getting that feeling of progress as you develop the code. This is the kind of work where you work on one piece and then another and then another, none of them seeming to do very much, until you bring them all together and then BAM! Suddenly you start getting visible results, usually not very good results to start with, but still something besides more source code sitting in your repository. That’s why writing management code, which can be just drudgery of a different sort if you do it all the time, can be so satisfying as a change of pace. Add a few lines, test, see visible results. Amazing.

Over the last few days, in the post-Summit lull, I’ve been working on the CloudFS management code. Most of it was written by Kaleb, but he was itching to tackle something more challenging, so I did some of the final integration with the rest of what’s there. It’s still very primitive by professional user-interface standards, but that’s kind of design. Many times while doing this, I stopped to ask myself whether I was working on presentation or plumbing. I really want to get all of the plumbing right but, whenever I realized I was working on presentation, I changed direction because some day we’ll need to get a real user-interface person in (any volunteers?) and anything I do in that area now will probably be replaced. The immediate goal is something that works, and that’s just barely pleasant enough that I as a developer won’t mind using it as I go about my other work. Yep, dog food again. All functionality is available via a command-line interface, as described in the manual, but I also have actual screen shots too (click images for full-size versions).

cluster management screen
Cluster Management Screen

This is where you can see which nodes are part of your GlusterFS/CloudFS cluster (“pool” in Gluster terminology) and add new nodes as long as they have the software installed (and ssh access).

volume management screen
Cluster Management Screen

This is probably the most useful screen for most people. It lets you see what volumes you have defined, create new ones, start/stop them, and so on. Starting and stopping is done the CloudFS way, which includes rewriting the configuration files automatically, integrating with our own daemon- and port-management code, etc. instead of using Gluster’s mechanisms for that (which are inadequate for CloudFS’s needs and unlikely to improve in that direction). There’s also a client-side mount script that uses this different infrastructure to generate the client-side configuration and map to the proper ports, so that users don’t need to be aware of all that complexity.

volume management screen
Cluster Management Screen

Now we start getting into actual CloudFS-specific functionality. This is where you can list, add, and remove CloudFS tenants. Don’t be scared off by the appearance of a password in this screen. That’s just a temporary artifact of the crummy mechanisms GlusterFS already has for access control. By the time this gets release “into the wild” the password will be replaced by a certificate (generated using a client’s private key and signed by the provider) which is used for the low-level SSL transport. Also, everything in the web interface will be done using HTTPS with separate management-specific certificates on both the server and client sides. I’ll work out the interface for some of that as I develop the underlying functionality, but the main point is that the final version will be much more secure than what’s shown above.

access management screen
Cluster Management Screen

Finally, the real key to CloudFS multi-tenancy. This is where you select which volumes a particular tenant can use; I plan to have a complementary screen to select which tenants can access a particular volume, but it doesn’t exist yet. This is more than superficial. When a CloudFS volume is started from the volume-management screen, or via the corresponding cfs_start_volume CLI, part of what happens is that the original GlusterFS “volfiles” are processed using the current list of allowed tenants to generate the CloudFS volfiles that will actually be used by the glusterfsd processes we start. Those processes then export tenant-specific volumes for clients to use based on the tenant identity and credentials provided to the mount script, and those volumes are configured to use tenant-specific mapping of user and group IDs. This is how we keep tenants strongly isolated from one another. The SSL transport module will also be added as part of this process, as will server-side helper translators to support the on-disk encryption (though the encryption itself and the keys it uses remain entirely on the client side).

That’s it for now. As I said, it’s still rather primitive, but at least it’s something I can show people. Some parts of it might even be useful for GlusterFS users who don’t even need CloudFS’s extra features. If you agree, feel free to grab the source code and hack away, or let me know in the comments here and we can talk about what needs to be done about splitting it from the rest of CloudFS.


Red Hat Summit Slides

Yesterday, I gave a presentation about CloudFS at Red Hat Summit in Boston. During the day I joked about how it had been scheduled at 5:30pm, with free food and beer elsewhere, so I didn’t have to worry about presenting to a large audience. How right I was. We ended up with about 40 people in a room that seats 200 – yes, I counted – and those who did attend seemed very tired from a full day’s worth of other sessions. It was a bit depressing, to be quite honest, especially when some folks got up and left half way through. Maybe it was for the free beer, maybe it’s because they realized this session wasn’t especially relevant to them, maybe it’s because I just sucked. Personally I would rate my performance as about average. I’m not a great speaker by any means, but I felt that I was able to present the material pretty clearly without pauses and stutters and nervous repetition like I saw from other speakers. There was some good back-and-forth with those who did stay, as well, and it was certainly better than not having been there at all, so I’ll try to focus on the positive.

The best thing that came out of this was probably not the in-person presentation itself but the opportunity to create a better set of slides than I had before. Here they are, in both OpenOffice and PDF forms. This includes the hidden and backup slides, plus notes that include most of what I said during the presentation and even some of the answers to questions, so if you’re reading the ODP version be sure to use a view that lets you see everything (the PDF version was already generated that way). I’m rather proud of these, actually, as I feel that they give the clearest picture to date of what CloudFS is about. Many thanks to Mark Wagner who did the performance part of the presentation (which meant that he had to put up with not having the same document on both the big and small screens due to mismatched-resolution problems with the conference setup), and to Ben England who actually did the vast majority of the testing to generate those numbers.

Just to be clear on what CloudFS’s goals are, I really really wish I could be working more actively on the “dynamo” and “paphos” (multi-site replication – so named because of this) pieces which only occupy two backup slides at the end of the presentation. Those are the features that inspired me to start the project, I think eventually they’ll be the most compelling for users, but in the interim I just saw these huge functional gaps between what current distributed filesystems provide and what’s needed to deploy them properly in a cloud environment. Frankly that stuff’s kind of boring. It involves a lot of mucking about with mundane GlusterFS implementation details, and a lot of “anyone could write this” kinds of code to interface with libraries or handle management functions, with little in the way of algorithmic excitement except for a few encryption-related bits. It’s also necessary. I firmly believe that this is the stuff people need, right now, before we go off and do the more cutting-edge stuff. Stay tuned, and we’ll get to the good part. I promise.


Building Your Own Dropbox Equivalent

My previous dog food post did generate a couple of requests for more detailed instructions on how to build your own private cloud storage repository, so here goes.

First, you need to have a server somewhere that will always be on, or at least on as much of the time as you want to access your cloud storage. I happen to have a server in the Rackspace Cloud that I leave running for various purposes at $10/month. If you already have a cloud server or VPS that you’re running somewhere, or a home/work machine, that works fine too.

Next, you need to be running GlusterFS on your server and clients. There are already pre-packaged versions available for some Linux distributions, though some are more current than others. For example, Fedora is at 3.1.3 which is pretty close to current, while Ubuntu is at 3.0.5 which is simply too old to be useful. Downloads are also available from Gluster themselves and elsewhere. However, you’re highly likely to run into a problem with many of the pre-packaged versions. Prior to 3.1.4, the servers would not accept connections from non-privileged ports and if you’re using any sort of NAT/tunnel/VPN you’ll probably be assigned just such a port. At least in 3.1.4 they added options to override this misguided code (which doesn’t even check the port’s privilege status the right way BTW) but my choice was to remove it entirely and build my own RPMs. I’ll provide patches and fixed RPMs soon; bug me if I forget. Anyway, install whatever 3.1+ version you want and move on to the next step.

At first, you’ll want to create your volumes and start your server-side daemons the “normal” way, something like this.

# mkfs -t ext4 /dev/sdwhatever
# mkdir -p /bricks/mycloud
# mount -o noatime,user_xattr /dev/sdwhatever /bricks/mybrick1
# gluster volume create mycloud my.external.ip.address:/bricks/mybrick1
# gluster volume start mycloud

At this point you would actually be able to mount the “mycloud” volume from any GlusterFS client, unless you have firewall issues (which you often will in a public cloud including Rackspace). Here’s one simple way to get around that, though I actually don’t recommend this method for reasons I’ll get to in a moment.

# netstat -lpn | grep glusterfsd
# iptables -I INPUT -p tcp --dport 24009 -j ACCEPT

24009 is just the port I happened to get. Also, you’d probably want to make the iptables rule more specific to the hosts you use, save it as part of your private config, yadda yadda. However, the reason you probably don’t want to do this is that it provides no security at all. Instead, you’ll want to run this through some sort of tunnel, and in that case you’ll effectively be making a local connection to the server. Here’s how you’d do it with ssh.

# ssh -L24009:localhost:24009

The other part of this is avoiding some of Gluster’s port-mapping silliness. Grab the client-side “volfile” describing the mount from /etc/glusterd/vols/mycloud/mycloud-fuse.vol and copy it to whatever machines you’ll be mounting on. Edit the “protocol/client” part of the copy to look something like this:

volume mycloud-client-0
    type protocol/client
    option remote-host localhost
    option remote-port 24009
    option remote-subvolume /bricks/mybrick1
    option transport-type tcp

Now your mount command can change something like this.

# mount -t glusterfs /mnt/mycloud # glusterfs way = wrong
# mount -f ~/mycloud-fuse.vol /mnt/mycloud # cloudfs way = right

At this point you have a remotely accessible directory with encrypted communication, but data on the remote server that’s still accessible to anyone who can get into that server. In other words, you’ll have something practically equivalent to most of the cloud-storage offerings out there – including the ones you pay for. What you need, both to preserve your privacy and to comply with many companies’ information-security policies (such as Red Hat’s), is pure client-side encryption. That’s where CloudFS comes in . . . or at least part of it. At this point you need to be a bit of a coder or fairly advanced sysadmin, because I’m going to make you pull the code and build your own RPM. I’ll provide some RPMs shortly; also, note that you specifically want the “aes” branch for now. To make it work for you, you’ll need to edit that client-side volfile again, adding a new “encryption/crypt” stanza at the end like this:

volume mycloud-crypt
    type encryption/crypt
    option key %0123456789abcdef0123456789abcdef
    subvolumes mycloud-client-0

The key can be whatever you want, at 128/192/256 bits, and can also be stored in a file (use a first character of “/” instead of “%” in the volfile). One easy way to generate such a key is like this:

dd if=/dev/urandom bs=16 count=1 | od -tx1 -An | tr -d '\n '

In any case, the resulting volfile contains nothing but the two stanzas I’ve shown, in that order. All of the “performance” translators which can interfere with proper operation of the encryption code need to be stripped out.

At this point you should be able to mount using your custom volfile to have full encryption both on the network and on disk, with all of the keys safely on your own client system(s). Performance might not be all that great for some kinds of operations, but I’m using it myself and it seems adequate for most purposes. Some day I’ll finish the UID-mapping translator so that you can use this from different UIDs on different machines without getting permission errors, and I’ll finish the built-in SSL transport so you can connect directly instead of needing an ssh tunnel. Then it’ll be really cool, but you know what might be even cooler? With GlusterFS 3.2 you could even replicate this data across two different cloud providers, giving you all of that “multi cloud” goodness that has been all the rage since the EBS outage. With CloudFS 1.0 you’ll be able to do it an even better way, with better consistency and decent performance and so on. At that point I’ll seriously start to question the sanity of anyone who’s using some other solution that doesn’t offer the same performance and the same levels of data protection from both accident and intrusion, or which isn’t open-source like everything I’ve talked about.

UPDATE: Some people on the Hacker News thread (thanks Jesse) have rightly pointed out that what I’ve described here isn’t really equivalent to Dropbox. Rather, it’s a directly mountable filesystem which I feel is even better. If you really want something equivalent to Dropbox, you’d have to do two things. To get the sync/work-offline functionality, you’d have to set up this plus a second directory and use something like rsync/Unison/lsyncd to sync between them. To get a GUI, there must be a few dozen web apps you can plop onto the server pointing at a local mount of the same volume. Note, though, that you give up some security when you do this because anybody who can get onto the server can bypass encryption by looking in the mount used by the web app. Since the whole point is to have better security than Dropbox, not to repeat their mistakes, I can’t recommend that. The nearest thing I can think of would be to build the encryption pieces directly into the web app (GlusterFS does actually make this possible) but even then the data would still exist transiently in the web app’s memory on the server so it’s not much an improvement. If anybody knows of a good way to have a web UI that does the encryption/decryption entirely on the client side – big blob of JavaScript? – please let us know in the comments. It might be an interesting project, though probably not one I’m interested/qualified to attempt personally.