Pomegranate is a new distributed filesystem, apparently oriented toward serving many small files efficiently (thanks to @al3xandru for the link). Here are some fairly disconnected thoughts/impressions.
- The HS article says that “Pomegranate should be the first file system that is built over tabular storage” but that’s not really accurate. For one thing, Pomegranate is only partially based on tabular storage for metadata, and relies on another distributed filesystem – Lustre is mentioned several times – for bulk data access. I’d say Ceph is more truly based on tabular storage (RADOS) and it’s far more mature than Pomegranate. I also feel a need to mention my own CassFS and VoldFS, and Artur Bergman’s RiakFuse, as filesystems that are completely based on tabular storage. They’re not fully mature production-ready systems, but they are counterexamples to the original claim.
- One way of looking at Pomegranate is that they’ve essentially replaced the metadata layer from Lustre/PVFS/Ceph/pNFS with their own while continuing to rely on the underlying DFS for data. Perhaps this makes Pomegranate more of a meta-filesystem or filesystem sharding/caching layer than a full filesystem in and of itself, but there’s nothing wrong with that just as there’s nothing wrong with similar sharding/caching layers for databases. Compared to Lustre, this is a significant step forward since Pomegranate’s metadata is fully distributed. Compared to Ceph, though, it’s not so clearly innovative. Ceph already has a distributed metadata layer, based on advanced distribution algorithms to distribute load etc. Pomegranate’s use of ring-based consistent hashing suits my own preference a little better than Ceph’s tree-based approach (CRUSH), but there are many kinds of ring-based hashing and it looks like Pomegranate won’t really catch up to Ceph in this regard until their scheme is tweaked a few times.
- I’m really not wild about the whole “in-memory architecture” thing. If your update didn’t make it to disk because it was at the end of the in-memory queue and hadn’t been flushed yet, that’s no better for reliability than if you just left it in memory for ever (though it does improve capacity) and if you acknowledged the write as complete then you lied to the user. Prompted by some of the hyper-critical and hypocritical comments I’ve seen lately bashing one project for lack of durability, I have another blog post I’m working on about how the critics’ own toys can lose or corrupt data, and how claiming superior durability while using “unsafe” settings for benchmarks is dishonest, so I’ll defer most of that conversation for now. Suffice it to say that if I were to deploy Pomegranate in production one of the first things I’d do would be to force the cache to be properly write-through instead of write-back.
- I can see how the Pomegranate scheme efficiently supports looking up a single file among billions, even in one directory (though the actual efficacy of the approach seems unproven). What’s less clear is how well it handles listing all those files, which is kind of a separate problem similar to range queries in a distributed K/V store. This is something I spent a lot of time pondering for VoldFS, and I’m rather proud of the solution I came up with. I think that solution might be applicable to Pomegranate as well, but need to investigate further. Can Ma, if you read this, I’d love to brainstorm further on this.
- Another thing I wonder about is the scalability of Pomegranate’s approach to complex operations like rename. There’s some mention of a “reliable multisite update service” but without details it’s hard to reason further. This is a very important issue because this is exactly where several efforts to distribute metadata in other projects – notably Lustre – have foundered. It’s a very very hard problem, so if one’s goal is to create something “worthy for [the] file system community” then this would be a great area to explore further.
Some of those points might seem like criticism, but they’re not intended that way – or at least they’re intended as constructive criticism. They’re things I’m curious about, because I know they’re both difficult and under-appreciated by those outside the filesystem community, and they’re questions I couldn’t answer from a cursory examination of the available material. I hope to examine and discuss these issues further, because Pomegranate really does look like an interesting and welcome addition to this space.