Someone on Slashdot asked how to do filesystem snapshots on Linux. Many respondents pointed out somewhat reasonably that the current setup with ext3 on raw disks didn’t support that, and that the poster should migrate onto another filesystem or LVM to get that functionality, but nobody seemed to have much to say about how to do that initial migration non-disruptively. I’ve had to do filesystem migrations involving many terabytes and many millions of files, and it’s a non-trivial exercise. There are a lot of ways to get it wrong and ruin your data. Based on things I’ve done since then, here’s the approach I’d investigate first if I had to do it now.

One of the less obvious tricks I’ve learned to do with GlusterFS is to run it all on one machine. The design very deliberately lets you stack “translators” on top of one another in all sorts of arbitrary ways, and the network protocol modules use the same translator interface as well. I often run the “distribute” translator, or those I’ve written, directly on top of the “storage/posix” local-filesystem translator. It works fine, and it’s much more convenient for development than having to run across machines. GlusterFS also has a replication translator, and one of the functions it necessarily provides is “self-heal” to rebuild directories and files on a failed (and possibly replaced) sub-volume. Do you see where I’m going with this? You can set up an old local filesystem and a new (empty) local filesystem as a replica pair under GlusterFS, and then poke it to “self-heal” all the files from old to new while the filesystem is live and in active use. GlusterFS doesn’t care that the two filesystems might be of different types (e.g. ext3 vs. btrfs) and/or using different kinds of storage (e.g. raw devices vs. LVM) so long as they both support POSIX locks and extended attributes. All the while it keeps track of operations in progress to the composite filesystem so this activity is effectively transparent to users who just see essentially the same filesystem they always saw. When you’re done, you just take GlusterFS back out of the picture and mount the new fully-populated filesystem directly. Here’s a configuration file to do just what I’ve described. It takes an existing directory at /root/m_old and combines that with an empty directory at /root/m_new to create a replica pair. Here are the commands to mount it and force self-heal.

mount -f /usr/etc/glusterfs/migrate.vol /mnt/migrate
ls -alR /mnt/migrate

I should warn people that I’ve only done a very basic sanity test on this. It seems to work as expected for a non-trivial but still small directory tree, but you’d certainly want to test it more thoroughly before using it to migrate production data (and of course you should absolutely make sure you have a backup that works before you attempt any such thing). There are a couple of non-obvious things about the configuration that I should also point out.

  • Both filesystems should, as mentioned previously, support POSIX locks and extended attributes. You need to load the features/locks translator to use the former.
  • For some reason this doesn’t seem to work without the server and client modules involved. This hasn’t been the case in my experience with other composite translators, and it shouldn’t be necessary here either, but at least the networking is all local so it’s not too terrible.

I’m sure there are other live-migration approaches that should work just as well if not better. I suspect there’s at least on approach using union mounts, for example. There are also a lot of approaches I can think of that would fall prey to subtle issues involving links (or other things that aren’t plain files), sparse files, and so on. It’s a lot easier to suggest an answer than to implement one that actually works. I’ve even thought (since SiCortex) of writing a FUSE filesystem specifically to do this kind of migration, but it would require a significant effort. This seems like an easy and fairly robust way to do it using tools that already exist.