It looks like there’s a new set of sickeningly cute and self-congratulatory slides for Sun’s ZFS, about which I’ve written before. At least one of my predictions seems to have come true. If you look at slide 24 (“Object-Based Storage”) it’s quite clear that there really is a volume manager lurking in there, even though they try to play on people’s fears of volume managers and make it seem that ZFS has made the need for them go away. Personally, I think that such misleading claims are a blot on an otherwise excellent project. There’s a lot of great stuff in ZFS, from streamlined management to data-integrity to performance and scalability (though I do wonder why someone supposedly involved in “SAN Engineering Product Development” doesn’t seem bothered by ZFS not being a true SAN filesystem) but if I were one of the engineers on that project I’d really be getting on marketing’s case about how it’s (mis)represented.

Enough about that, though. I already wrote about that last time. What’s nearer and dearer to my heart this time, since I’m working in the Continuous Data Protection space, is the claims they make about nearly infinite and nearly free snapshots in ZFS. It makes me wonder how they deal with space reclamation, and of course those issues aren’t even mentioned in any of the slides etc. I’ve seen. If you’re keeping old copies of data around, your filesystem is eventually going to fill up. Sooner or later, no matter how much space you have allocated, you’re going to need to toss out some old data to make room for new; it’s not very nice to start throwing errors to whoever’s writing the new data because the disk is full of months-old data in which nobody has even expressed an interest recently. If each block could potentially be referenced as part of the active filesystem and as part of several snapshots/clones, how do you know when it’s safe to get rid of it? Refcounts? Think for a moment about how important it is to make sure those refcounts are fully up to date with every transaction, and the potential performance cost of making sure that they are. If, on the other hand, each block can belong to only one “view” (active/snapshot/clone) of the filesystem, then massive copying is necessary whenever a snapshot/clone is created and they could hardly be as instant or as infinite as they’re claimed to be. I’m not saying clever approaches don’t exist to ameliorate this problem – how we deal with it is an important part of our “special sauce” at Revivio – but it is most definitely a core problem for this kind of system and one that anyone bragging about capabilities should at least mention.