Status Report 2011-07

I know July’s not over, but I’ll probably be doing these less than monthly so I doubt there will be a naming conflict, and enough has been going on that I figure an update here is worthwhile. In brief, as part of the push to meet Fedora 16 deadlines, I think we could reasonably claim that CloudFS meets the “functionally complete and testable” standard. In other words, a normal person can set it up in reasonable time (unlike the version I tried to push into Fedora 15) and they can reasonably expect its advertised features to work correctly most of the time. It’s still young code, I expect there will be plenty of bug fixes, but there won’t be any significant feature additions or redesigns for the version in Fedora 16. Here’s a little more detail about specific features.

  • Namespace isolation
    This is complete, including management support, and has been for a while. You can add tenants, associate them with volumes, and then mount a set of tenant-specific subdirectories across all of your distributed/replicated volumes. as a single unified (but tenant-private) namespace.
  • ID isolation
    This has also been complete for a while, thanks to Kaleb. Configuration of ID ranges is manual and uses the same range for all volumes accessible by a tenant, which introduces a sort of tiling problem when you have many tenants with overlapping access to many volumes, but in the end each tenant does get a unique set of IDs on each server and all of the necessary mapping ID mapping is in place.
  • At-rest encryption
    It works, it’s not too expensive, but in the interests of full disclosure I’d have to say that what’s there right now is kind of weak. While it will protect against casual snooping, it has known vulnerabilities – e.g. watermark attacks, targeted single-bit modification – that make it no more than a placeholder. A much more secure encryption method has been developed in collaboration with some of Red Hat’s world-class security experts, and is partially implemented but won’t appear until a post-1.0 release.
  • In-flight encryption
    The OpenSSL-enabled transport module, which is also multi-threaded to ameliorate the performance penalty associated with running SSL in the I/O path, is now part of the package. It can coexist with the regular GlusterFS socket transport, using the Gluster transport for GlusterFS volumes and the CloudFS transport for CloudFS volumes, even simultaneously. Some day this duplication will go away, in favor of having a single SSL-capable and multi-threaded socket transport as part of GlusterFS itself, but it just turned out that we couldn’t make that happen in time for Fedora 16.
  • Management
    The management GUI and CLI described here previously is still in place, and hasn’t changed much except for additional pieces to deal with the two kinds of encryption.
  • Packaging
    Kaleb has done yeoman’s work whipping this into shape, with the happy result that CloudFS should soon be just a “yum install” away.

That’s a pretty good milestone, nearly concurrent with my two-year anniversary at Red Hat. I’m happy. I’m also a perfectionist, so here’s a list of things that I think will improve even further in the near to medium term.

  • Fully automatic and per-volume ID mapping
  • Stronger at-rest encryption, including encryption of file names as well as contents.
  • Full integration of SSL authentication and authorization, eliminating the need for a separate layer of per-volume access control. I actually do have a patch for this, which I have submitted for inclusion in GlusterFS, so it’s mainly a release-engineering rather than development effort at this point.
  • Better UI. The web UI we have is very spartan, plain HTML forms, not even a nice sidebar to ease navigation. It doesn’t have to be all shiny or AJAXy, but a little love from a true web UI person would go a long way.
  • SSL for the web UI. Again, I have code for this but some assembly is still required.
  • UI support for quota. ‘Nuff said.
  • Bugs, performance, bugs, documentation, bugs.
  • Packaging. This doesn’t have to be a Fedora-only project. If anyone wants to help develop packages for other distributions, I’ll pay you in beer or cookies or whatever comestibles you prefer.

That’s sort of CloudFS 1.1; for CloudFS 2.0 there’s another long list of things that we might do. Here are the big two.

  • Faster replication. The current replication translator (AFR) requires several network round trips per write, resulting in sub-optimal performance for latency-sensitive workloads. Using ideas from Amazon’s Dynamo and similar systems, I believe we can do replication so that the normal (non-failure) case requires only N parallel writes for negligible effect on performance. This might make GlusterFS/CloudFS more appealing for database, virtual-machine-image, and other workloads that fit the “bad” performance profile currently.
  • Namespace caching. Two of the most painful things you can do on GlusterFS, or on most similar filesystems, are listing large directories and loading lots of small files (such as the hundreds that many PHP scripts seem to load). The idea here is to have a client-side translator that caches at least the directory information and possibly even the contents for small files on local disk (not just memory), with fairly weak consistency but very high performance. This would make GlusterFS/CloudFS more appealing for read-mostly workloads where this would be the right consistency/performance tradeoff, and most especially for web service.

There’s even more stuff to do for CloudFS 3.0, from erasure codes or information-dispersal algorithms to ordered asynchronous multi-site replication/caching (quite distinct IMO from Gluster’s unordered active/passive replication), but they’re all so far off that it hurts even to think about them. Astute readers might also notice that improved distribution doesn’t make any of these lists. I still think there are elasticity/scaling problems that need to be solved with Gluster’s current approach, but they’ve been making such great progress in those areas lately that I don’t think it needs to be a focus for CloudFS. Heck, for all I know some of the items I do mention above don’t need to be done as part of CloudFS because they’ll be done as part of GlusterFS, and that would be great. My reason for putting this wish list up here is mainly so that people can let folks on both projects know what features would make both more appealing to actual live users. Feel free to use the comments for that.


2 Responses

You can follow any responses to this entry through the RSS 2.0 feed.

Both comments and pings are currently closed.

  1. Can Ma says:

    Listing large directories maybe useful, but listing huge directories maybe meaningless. When you want to list the directories, you actually might want to get a specific file or some files. If you get too many directory entries(information) without a focus, you get nothing. Thus, maybe we need a powerful ‘search or sth similar’ API in large scale distributed file system to get focused information from directories. We have some preliminary explorations on this issue.
    For small files, client caching is good. However, if you cache a lot of small files in local disk, local file systems might be suck too :-)

  2. Hi Jeff. I just came across the existence of CloudFS/HekaFS a few hours ago, and I am very, very excited — everything you’re saying is something we’ve needed in my work environment for a while now. Thanks for tackling these things!