The question was actually posed to me via email, and I’m being evil by posting it without permission, but it’s actually a question I’ve been asked quite a few times so I’m pretty optimistic that the author will forgive me.

Have you ever thought of the implications of a file system design that utilizes a content based file handle (a crypto hash of the raw file data)?

Heh. Only all the time. ;-)

Seriously, though, content-based addressing has some interesting advantages and drawbacks. On the one hand, it makes a large class of consistency problems just go away, and that’s a Major Good Thing. The obvious thing to do would be to arrange all data in a filesystem hierarchically, much like a phase tree. To write a block you’d first insert the data block, then a new version of its inode to point to the new block, then a new version of the directory to point to the inode, all the way up to the root. It would be impossible to get an inconsistent state in such a system. Cool, huh?

The problem is that it’s also very difficult to make sure that you’re using the filesystem’s current state. While a certain amount of “slack” is generally allowable, especially in a publication-oriented system, users’ and applications’ tolerances for staleness are limited. The only way to be sure that what I’m reading is current as of a particular point in time is to work all the way down from the root to the block I want, but that has horrendous performance implications. There’s also a logistical problem with all those old versions of inodes and directories remaining in the system essentially forever because nobody can ever really know whether it’s safe to drop them. Lastly, while content-based addressing in some ways makes it easy to continue writing despite disconnection and network partitions, there’s still a very serious problem of how to reconcile the two halves of a split brain properly when everything’s connected again.

Some of these problems strike me as both important and unsolvable in a purely content-based-addressing scheme. At some point it seems that you need some part of the system to be based on a scheme that provides stronger consistency, and that’s where I spend a lot of my “think time” on the subject. I’m pretty sure that there are other people with a perfectly good handle on the content-based part of such a hybrid system, but it’s still pretty unclear exactly what the characteristics of the other part would be.