One of the themes that keeps coming up in my cloud work is where running code should live. For example, let’s say I’m a cloud provider (specifically an IaaS cloud), and my users keep asking about one of the various key/value (or column) stores I’ve been writing about recently. Let’s pick Voldemort, just because I’ve written less about it than the others so far. Should I tell them to run Voldemort privately, within their own compute instances, or should I provide a permanent public Voldemort cloud that they can connect to? Here’s where the Catch-22 comes in.

  • I can’t set up a public Voldemort cloud for my users, because it – like practically everything else in this space – lacks even the concept of multiple users who must be separately authenticated etc. Any user can go and read or overwrite any other user’s data. The only way any sane person could run any of them in the cloud would be inside their own instances and the firewall associated with them.
  • Unfortunately, since virtualized network and storage performance are still very much works in progress, their performance running this kind of stuff inside their own instances is going to suck compared to running on bare metal.

This is pretty much the same problem I’ve been grappling with for parallel filesystems. At least most of those have a concept of multiple users, but they really don’t know about a three-level provider/tenant/user hierarchy so the best you can do is map cloud tenants to filesystem users and say that anything beyond that is the tenants’ problem. Ditto for quotas. Hooks to support billing for all of this are also completely missing in both key/value stores and filesystems.

The problem is that, thanks to virtualization, we’re in the kind of world Intel and AMD were predicting for us before they hit the clock-speed wall. Processors haven’t become faster in absolute terms, but they’ve become faster relative to networking and storage. If you can get storage at more than 30MB/s and networking at more than 200Mb/s (with a 1500-byte MTU and don’t even get me started on latency) in a virtualized environment you’re doing really well. A cloud instance is like a processor from three years ago and I/O from seven. The system balance is quite different than it is for things that run native, and – as my disparate results for the same code running on Amazon and Rackspace show – system balance really affects performance.

What this means is that developers who expect their code to run in the cloud are faced with a choice. Dismissing cloud (or other virtualized) environments is not part of that choice. Whether you agree with the underlying rationale or not, the empirical reality for now and the next few years is that many people are moving in that direction. If your code runs poorly in such environments, “you’re doing it wrong” is just going to annoy the people who reported that fact. You’re the one who’s doing it wrong, and you have three ways to do it right.

  • Design software that’s not unnecessarily dependent on a native system balance to achieve decent performance, test performance in cloud environments, and provide parameter sets appropriate for those environments. You might even consider providing multiple parameter sets for different cloud providers and instance types, since they all present different performance profiles.
  • Start providing multi-tenant features so your software can be deployed as a standalone service on non-virtualized hardware. This means providing multi-user authentication and access control, some level of performance and fault isolation, and at least minimal accounting hooks.
  • Accept that your software is not a fit for an IaaS cloud, and make sure you don’t imply otherwise when you promote your project. There are still plenty of ways to work in a PaaS or SaaS cloud, or in a high-scale but non-cloud environment.

Another way to look at this is to say that where there’s failure there’s also opportunity. The first key/value store to offer full support for multi-tenant environments along with competitive performance/scalability and robustness could get itself established as a de facto standard before the others catch up. Some of the projects I’ve mentioned before are already heading in that direction, others probably have plans to, and there are almost certainly still other projects I haven’t even heard of. If I weren’t already committed to working on a cloud filesystem instead, I’d seriously consider joining one of these projects to move it along.