I tried running GlusterFS on Rackspace’s cloud, to compare the results with what I’d seen on Amazon. Since Rackspace had shown better raw I/O performance before, I was expecting better filesystem performance as well. That turned out not to be the case, because I ran into a funny cloud-computing pitfall. After I’d dealt with building the code, and dealing with the firewall issues I mentioned before, I did a run and got much lower numbers than I’d expected. Odder still, the I/O seemed very bursty. Watching on one of the two servers, I’d see a burst of I/O every few seconds, with practically nothing in between.

I put this out of my mind temporarily, though, and as I started poking around I realized that I had used the instances’ public IP addresses to connect everything together. Knowing that performance using the internal addresses is usually better, I tried to switch over, but it didn’t work. It turned out that the client could see and connect to server1, but not server2. I double-checked the firewall rules, and everything seemed fine, but there was no connectivity. Not even ping. How strange. Pragmatic guy that I am, I decided to set this mystery aside as well. I rearranged which instances were servers and which instance was the client, so that the node which could see both others (formerly server1) was now the client. Performance was now slightly better, but not much, and still bursty. There was something interesting about the burstiness, though. I/O seemed to alternate between the two servers. Has anybody else guessed where this is going?

At this point I strongly suspect that two of my instances – originally server2/client, later the two servers – were actually on the same physical machine. Thus, they’re not only contending for resources but actually thrashing a bit when I try to run high-throughput tests. I did some more experiments and wasn’t able to prove this theory conclusively, but the evidence seems highly suggestive. This seems a bad VM-placement policy to me. When two instances are started close together (as these were), the optimal policy would most often be to spread them across machines, not to concentrate them on one. Cloud providers, please take note.

Below are some notes, mainly for my own future reference but potentially useful to others, about how to build and run this kind of thing on the Rackspace cloud.

(1) Logins. The most convenient way to work on a bunch of machines all at once is to set up some ssh key pairs (as AWS does – again they seem to have thought this through more). I did this manually, generating the key pair on my own machine and pushing it out via scp. Another gotcha on Rackspace is that you need to edit /etc/ssh/sshd_config to define AuthorizedKeyFile and enable ChallengeResponseAuthentication, then restart sshd. Make sure you keep a login active when you do this (e.g. using the extra-slow instance console) because if you mess it up you might not be able to log in again. I had to recreate one instance when I made this mistake.

(2) Networking. One of the first things you’ll probably want to do is get an /etc/hosts that includes all instances’ public and private addresses, and make sure it’s common everywhere. Then you’ll have to deal with the firewall rules. The commands I used were pretty much like this:

iptables -I INPUT -p tcp -s server2 -j ACCEPT

Make sure you test all of the paths, to make sure you don’t get caught by the same instance-placement issue I did (or at least so you understand any limitations that result).

(3) Building. I’ve developed a standard formula for getting all of the tools I need onto a system. Some of these are not actually necessary for GlusterFS (some are for Ceph), but they’re good to have anyway.

yum install gcc gcc-c++ libtool binutils glibc-headers kernel-headers autoconf automake bison flex make fuse fuse-devel

(4) Running. Rackspace really like to put everything, including logs, into /usr/local. Make sure you put your server config files into /usr/local/etc/glusterfs/glusterfsd.vol, using whichever example you prefer from the GlusterFS site, and distribute them everywhere. I’ve found it’s easiest to keep the client config file glusterfs.vol in the same place on one of the servers, and mount on the clients like this:

glusterfs –volfile-server=i-server1 –volfile-server-port=6996 /home/gfs