Archive for July, 2011

Moving to the Cloud

Harald Welte seems to be missing the point of the U.S. government announcing the closure of 800 data centers. If you read the NYT article carefully, you’ll see that the relationship between a government move toward cloud computing and these 800 closures is tenuous at best, and it never even mentions moving anything to a public cloud. Let’s look at Harald’s specific complaints.

[As a government, you...] make yourself dependent from a private company to supply essential infrastructure

We don’t even know that’s the case here. They could be moving from these 800 data centers to a wholly government-owned and government-run private cloud that exists across several larger ones. That’s what they have already done in some parts of the government, and what I’m sure they will continue to do for some parts of this program. They will almost certainly outsource some functionality from these data centers to public or semi-public (leased) cloud facilities, but that might not involve any essential infrastructure (Harald’s term). There’s a lot the government does that’s not essential. Would it really be that much of a problem if, for example, some functions of the National Archives and Records Administration moved to a public cloud? I don’t think so. Let’s worry about the government indiscriminately outsourcing military/police/security functions before we worry about selective IT outsourcing.

introduce single points of failure (technically, administratively)

Why assume that there will be more SPOFs than currently? If a particular government department currently hosts machines in one data center it owns, and trades that for resources in several goverment-cloud data centers that it shares with 799 other departments, how is that any worse? The per-application architecture will be more or less free from single points of failure just as it would be if privately hosted, and the fewer/larger cloud centers are likely to be far better run that the lower third (if not all) of those 800 department-private data centers are today. As long as there are at least three or four data centers involved, and there are sure to be more than that, there might well be a net improvement in availability.

give up control over who physically owns and has access to the data
In fact, you will have a hard time even finding anyone at all who can tell you where your data is physically located. Maybe even out of the country?

This is the part that makes the topic relevant for CloudFS. Moving data to the cloud does not have to mean giving up knowledge of its location or control over access to it. Remember, private or leased clouds haven’t been ruled out, and in those you have complete control over location. Even in public clouds, this is often the case; for example, each Amazon EC2 region exists at a fairly well known physical location. Public-cloud providers already cater to users who must comply with rules like HIPPA and SarbOx and PCI and EUPD that set requirements regarding data location, so this could not be otherwise. With respect to access, if you do encryption and key management the right way – as we try to in CloudFS – then you have the same control over data access in the cloud that you do on your own machines. In fact, I’ll bet there are hundreds of government departments whose data centers are less physically secure than those at Amazon or Rackspace, and who either don’t use encryption at all or don’t use it effectively. Moving to the cloud doesn’t solve their encryption problem, but it doesn’t make that problem any worse either, and the effect of this move on their physical security is likely to be a positive one.

So, is a move from 800 mediocre data centers to a couple of dozen world-class ones really such a bad idea? I’d never underestimate the government’s ability to mess something like this up, and any transition creates opportunity for failure, but none of Harald’s objections stand up to scrutiny. Whether this initiative succeeds or not is a matter of execution; fundamentally, the architecture they’re embracing is still far better than the one they’re abandoning.

 

Data Integrity

One of the possible CloudFS 2.0 features that I forgot to mention in my post yesterday, and which was subsequently brought to my attention, is the addition of data-integrity protection using checksums or similar. Some people think this is just part of encryption, but it’s really more than that. This same kind of protection exists at the block level in the form of T10 DIF and DIX, or at the filesystem level inside ZFS or btrfs, and none of those were developed with encryption in mind. The basic idea behind all of them is that for each data block stored there is also a checksum that’s stored and associated with it. Then, when the block is read back, the checksum is also read back and verified as correct before the data can be admitted to the system. The only way encryption enters the picture is that the integrity check for a given data block has to be unforgeable, which implies the use of an HMAC instead of a simple checksum. This protects not only against random corruption, but also against a targeted attack which modifies both the data and the associated HMAC (which might be equally accessible to an attacker in naive designs, though our design is likely to avoid this flaw). Thus, we need three things.

  • A translator (probably client-side) to handle the storage of checksums/HMACs received from its caller during a write, and also their retrieval – but not checking – during a read.
  • A translator that generates and checks simple checksums for the non-encrypted case.
  • Enhancements to the encryption translator to generate and check HMACs instead.

Another possible implementation would be to implement all of the code inside the encryption translator, and simply have an option to turn actual encryption off. Such a “monolithic” approach is unappealing both for technical reasons and also because it would preclude offering the data-integrity feature as part of GlusterFS while keeping encryption as part of CloudFS’s separate value proposition.

If you think this kind of data-integrity protection would be important enough for you to justify making it part of CloudFS 2.0, please let me know. It’s much easier to make a case that Red Hat (or Gluster) resources should be devoted to it if there’s a clear demand from people besides the project developers.

 

Status Report 2011-07

I know July’s not over, but I’ll probably be doing these less than monthly so I doubt there will be a naming conflict, and enough has been going on that I figure an update here is worthwhile. In brief, as part of the push to meet Fedora 16 deadlines, I think we could reasonably claim that CloudFS meets the “functionally complete and testable” standard. In other words, a normal person can set it up in reasonable time (unlike the version I tried to push into Fedora 15) and they can reasonably expect its advertised features to work correctly most of the time. It’s still young code, I expect there will be plenty of bug fixes, but there won’t be any significant feature additions or redesigns for the version in Fedora 16. Here’s a little more detail about specific features.

  • Namespace isolation
    This is complete, including management support, and has been for a while. You can add tenants, associate them with volumes, and then mount a set of tenant-specific subdirectories across all of your distributed/replicated volumes. as a single unified (but tenant-private) namespace.
  • ID isolation
    This has also been complete for a while, thanks to Kaleb. Configuration of ID ranges is manual and uses the same range for all volumes accessible by a tenant, which introduces a sort of tiling problem when you have many tenants with overlapping access to many volumes, but in the end each tenant does get a unique set of IDs on each server and all of the necessary mapping ID mapping is in place.
  • At-rest encryption
    It works, it’s not too expensive, but in the interests of full disclosure I’d have to say that what’s there right now is kind of weak. While it will protect against casual snooping, it has known vulnerabilities – e.g. watermark attacks, targeted single-bit modification – that make it no more than a placeholder. A much more secure encryption method has been developed in collaboration with some of Red Hat’s world-class security experts, and is partially implemented but won’t appear until a post-1.0 release.
  • In-flight encryption
    The OpenSSL-enabled transport module, which is also multi-threaded to ameliorate the performance penalty associated with running SSL in the I/O path, is now part of the package. It can coexist with the regular GlusterFS socket transport, using the Gluster transport for GlusterFS volumes and the CloudFS transport for CloudFS volumes, even simultaneously. Some day this duplication will go away, in favor of having a single SSL-capable and multi-threaded socket transport as part of GlusterFS itself, but it just turned out that we couldn’t make that happen in time for Fedora 16.
  • Management
    The management GUI and CLI described here previously is still in place, and hasn’t changed much except for additional pieces to deal with the two kinds of encryption.
  • Packaging
    Kaleb has done yeoman’s work whipping this into shape, with the happy result that CloudFS should soon be just a “yum install” away.

That’s a pretty good milestone, nearly concurrent with my two-year anniversary at Red Hat. I’m happy. I’m also a perfectionist, so here’s a list of things that I think will improve even further in the near to medium term.

  • Fully automatic and per-volume ID mapping
  • Stronger at-rest encryption, including encryption of file names as well as contents.
  • Full integration of SSL authentication and authorization, eliminating the need for a separate layer of per-volume access control. I actually do have a patch for this, which I have submitted for inclusion in GlusterFS, so it’s mainly a release-engineering rather than development effort at this point.
  • Better UI. The web UI we have is very spartan, plain HTML forms, not even a nice sidebar to ease navigation. It doesn’t have to be all shiny or AJAXy, but a little love from a true web UI person would go a long way.
  • SSL for the web UI. Again, I have code for this but some assembly is still required.
  • UI support for quota. ‘Nuff said.
  • Bugs, performance, bugs, documentation, bugs.
  • Packaging. This doesn’t have to be a Fedora-only project. If anyone wants to help develop packages for other distributions, I’ll pay you in beer or cookies or whatever comestibles you prefer.

That’s sort of CloudFS 1.1; for CloudFS 2.0 there’s another long list of things that we might do. Here are the big two.

  • Faster replication. The current replication translator (AFR) requires several network round trips per write, resulting in sub-optimal performance for latency-sensitive workloads. Using ideas from Amazon’s Dynamo and similar systems, I believe we can do replication so that the normal (non-failure) case requires only N parallel writes for negligible effect on performance. This might make GlusterFS/CloudFS more appealing for database, virtual-machine-image, and other workloads that fit the “bad” performance profile currently.
  • Namespace caching. Two of the most painful things you can do on GlusterFS, or on most similar filesystems, are listing large directories and loading lots of small files (such as the hundreds that many PHP scripts seem to load). The idea here is to have a client-side translator that caches at least the directory information and possibly even the contents for small files on local disk (not just memory), with fairly weak consistency but very high performance. This would make GlusterFS/CloudFS more appealing for read-mostly workloads where this would be the right consistency/performance tradeoff, and most especially for web service.

There’s even more stuff to do for CloudFS 3.0, from erasure codes or information-dispersal algorithms to ordered asynchronous multi-site replication/caching (quite distinct IMO from Gluster’s unordered active/passive replication), but they’re all so far off that it hurts even to think about them. Astute readers might also notice that improved distribution doesn’t make any of these lists. I still think there are elasticity/scaling problems that need to be solved with Gluster’s current approach, but they’ve been making such great progress in those areas lately that I don’t think it needs to be a focus for CloudFS. Heck, for all I know some of the items I do mention above don’t need to be done as part of CloudFS because they’ll be done as part of GlusterFS, and that would be great. My reason for putting this wish list up here is mainly so that people can let folks on both projects know what features would make both more appealing to actual live users. Feel free to use the comments for that.