ZFS gets dedup

Jeff Darcy November 2, 2009 10:20

Looks like Bonwick and Co. have been up to some neat stuff. I’m not even going to be snarky this time; it even looks like they’ve thought through the hash-collision tradeoffs, made some pretty reasonable choices, and refrained from the overzealous claims that marred the ZFS and FISHworks introductions. Nice work, guys!

2 Responses to “ZFS gets dedup”

  1. Jeff Garzikon 02 Nov 2009 at 7:31 pm

    Yuck, hash collisions rear their ugly head.

    Being realistic, I think few people will run with the ‘verify’ option enabled. Which is sad.

    When considering truly -massive- amounts of storage, in use for years, the probability of a collision (like the probability of life on another planet) does increase. And the cost of being wrong is -very- high: silent data corruption.

  2. Jeff Darcyon 02 Nov 2009 at 9:47 pm

    I suspect that Jeff B’s answer would be something about boiling the oceans, and with 256-bit hashes he might even have a point. At some point the probability of a correlated hardware failure exceeds that of a hash collision, and the probability of a software bug dwarfs either. Anybody who’s *that* worried about a 1/2^128 chance shouldn’t be using de-dup at all, even with “verify” set.

    It is interesting, though, that Jeff B posts no actual performance numbers, most notably those with “verify” turned on. So far they seem to be sticking to their guns on the safety of running without verification, but even if they’re proven wrong I’d expect a round of benchmarks with it off because that makes them look better.

Comments RSS

Leave a Reply