Hi, Rackspace!

I was just re-checking prices for some of the various cloud providers, and suddenly realized that Rackspace’s Cloud Server pricing page prominently features a quote from the block I/O performance comparison I did a while back. Cool. I wonder if they also saw my parallel filesystem results. Let me know when you’d like me to re-run that test, guys.

The Cloud Catch-22

One of the themes that keeps coming up in my cloud work is where running code should live. For example, let’s say I’m a cloud provider (specifically an IaaS cloud), and my users keep asking about one of the various key/value (or column) stores I’ve been writing about recently. Let’s pick Voldemort, just because I’ve written less about it than the others so far. Should I tell them to run Voldemort privately, within their own compute instances, or should I provide a permanent public Voldemort cloud that they can connect to? Here’s where the Catch-22 comes in.

  • I can’t set up a public Voldemort cloud for my users, because it – like practically everything else in this space – lacks even the concept of multiple users who must be separately authenticated etc. Any user can go and read or overwrite any other user’s data. The only way any sane person could run any of them in the cloud would be inside their own instances and the firewall associated with them.
  • Unfortunately, since virtualized network and storage performance are still very much works in progress, their performance running this kind of stuff inside their own instances is going to suck compared to running on bare metal.

This is pretty much the same problem I’ve been grappling with for parallel filesystems. At least most of those have a concept of multiple users, but they really don’t know about a three-level provider/tenant/user hierarchy so the best you can do is map cloud tenants to filesystem users and say that anything beyond that is the tenants’ problem. Ditto for quotas. Hooks to support billing for all of this are also completely missing in both key/value stores and filesystems.

The problem is that, thanks to virtualization, we’re in the kind of world Intel and AMD were predicting for us before they hit the clock-speed wall. Processors haven’t become faster in absolute terms, but they’ve become faster relative to networking and storage. If you can get storage at more than 30MB/s and networking at more than 200Mb/s (with a 1500-byte MTU and don’t even get me started on latency) in a virtualized environment you’re doing really well. A cloud instance is like a processor from three years ago and I/O from seven. The system balance is quite different than it is for things that run native, and – as my disparate results for the same code running on Amazon and Rackspace show – system balance really affects performance.

What this means is that developers who expect their code to run in the cloud are faced with a choice. Dismissing cloud (or other virtualized) environments is not part of that choice. Whether you agree with the underlying rationale or not, the empirical reality for now and the next few years is that many people are moving in that direction. If your code runs poorly in such environments, “you’re doing it wrong” is just going to annoy the people who reported that fact. You’re the one who’s doing it wrong, and you have three ways to do it right.

  • Design software that’s not unnecessarily dependent on a native system balance to achieve decent performance, test performance in cloud environments, and provide parameter sets appropriate for those environments. You might even consider providing multiple parameter sets for different cloud providers and instance types, since they all present different performance profiles.
  • Start providing multi-tenant features so your software can be deployed as a standalone service on non-virtualized hardware. This means providing multi-user authentication and access control, some level of performance and fault isolation, and at least minimal accounting hooks.
  • Accept that your software is not a fit for an IaaS cloud, and make sure you don’t imply otherwise when you promote your project. There are still plenty of ways to work in a PaaS or SaaS cloud, or in a high-scale but non-cloud environment.

Another way to look at this is to say that where there’s failure there’s also opportunity. The first key/value store to offer full support for multi-tenant environments along with competitive performance/scalability and robustness could get itself established as a de facto standard before the others catch up. Some of the projects I’ve mentioned before are already heading in that direction, others probably have plans to, and there are almost certainly still other projects I haven’t even heard of. If I weren’t already committed to working on a cloud filesystem instead, I’d seriously consider joining one of these projects to move it along.

Comparing Key/Value Stores, round 2

As promised, here are some more detailed results from my comparisons of various key/value stores. As mentioned previously, the methodology is pretty simple: set up a server on one (virtual) machine, fire up ten client threads on another, let it run for one minute, and count how many requests got through. I did this using a Python script per client thread along with a bash script to run all ten and tally results. To test cold reads, I’d stop all the servers and use “echo 3 > /proc/sys/vm/drop_caches” before restarting. As noted in the comments to the previous post, this approach does have some limitations or deficiencies: it measures only throughput rather than latency, it doesn’t include a warmup period to factor out high startup costs, it doesn’t generate enough data to measure high-key-count behavior, etc. I also quite deliberately measure by using the longest run time of the ten spawned threads, because in my experience worst-case component performance is usually the dominant factor in overall performance and inconsistent results are effectively bad results. FWIW, I didn’t actually see all that much variation in thread completion times. Anyway, here are the stores and interfaces I used:

  • tabled (git clone on 10/27) using boto
  • Cassandra 0.4.1 using thrift
  • Riak (hg clone on 10/27) using jiak
  • Voldemort 0.56 using my own voldemort.py (I had to fix a tcp:// URL-parsing glitch, and also fixed it so that it doesn’t disconnect/reconnect on every request)
  • Tokyo Tyrant 1.1.37 (Cabinet 1.4.36) using pytyrant
  • chunkd (git clone on 10/27) using my own chunkd.py based on Python’s ctypes module
  • Keyspace 1.2 using the built-in Python interface

I have a spreadsheet of the results, including some more configuration information. I’ll note that every test was run at least twice, sometimes several times on separate sets of instances, and I never saw much variation to indicate that the results were being affected by contention vs. other instances on the same physical machine. The results did shed some interesting light on differences between virtualized environments, which wasn’t my intent, but that information might be useful to some people in its own right. I also don’t think those differences affected the comparison of the stores; the “finish order” did shift around some, but in general if store X significantly outperformed store Y in one environment it was highly likely to do so in another. The results are what they are, and I think they do reflect the inherent performance characteristics of the stores more than anything else. The last general note I’ll make is that all of the results seemed slow to me. I’ve seen lots of claims about this store doing 20K ops/s/node, that store doing 100K, etc. Bunk. Maybe on a well-provisioned non-virtualized node, tweaked to the max, using more threads than any sane application writer would consider, such numbers are achievable. I very much doubt that’s the case for realistic workloads in a realistic environment. Caveat emptor.

I’ve already discussed the results from my own desktop, so let’s start with the EC2 single-node results. I was quite surprised that the results overall were better than on my machine, since the opposite has been true for other tests (e.g. parallel filesystems). Nonetheless, some interesting patterns started to emerge. Excluding Tyrant, chunkd consistently led for warm reads, while Keyspace did so for cold reads. This probably reflects on how well each does network and disk I/O, with chunkd doing the former well (hardly a surprise considering its authors’ backgrounds) and Keyspace doing the latter well. The write results seem to indicate that Cassandra is doing something pretty effective for small writes, but whatever it is doesn’t work for large writes.

Moving on to the EC2 three-node results, things got even more confusing. The first thing to notice is that very little positive scaling was evident. Many of the results were flat, or even significantly worse on three nodes than on one. Having worked in distributed systems for a long time, I was disappointed but not surprised. What’s interesting is that I would expect that result in the environment with higher network latency, but in my experience EC2 does pretty well vs. Rackspace in that regard (for disk I/O it’s the exact opposite). Whatever the reasons, Keyspace continued to lead for cold reads and did (comparatively) well on everything else this time too. The only test where it didn’t outperform Cassandra and Voldemort was 10K writes, and that test had the least variation. Cassandra trailed the pack in every single test this time.

The Rackspace results are more straightforward. For one thing, they show that Rackspace’s provisioning is much better that Amazon’s for this kind of workload. I think having multi-CPU instances is a major part of that. With this different machine balance, a different kind of pattern emerges. Let’s ignore Riak for now, for reasons I’ll get to in a moment. Keyspace ran well ahead of the pack for small reads (both hot and cold) while Voldemort acquitted itself well for larger ones. Cassandra’s small-write advantage seemed to disappear, but instead it outperformed the others for large writes. Clearly, different applications would get different results depending on their data-size distribution and read/write mix.

OK, so why did I say to ignore Riak? To put it bluntly, because it didn’t really work as a distributed system and the results are for only a single node. This was a very straightforward build, but even when I followed the (meager) instructions for cluster deployment exactly, the “doorbell port” used to put the cluster together would never get opened. I was able to confirm (via net:ping) that Erlang processes running on both nodes could connect properly in the general case, but I’m not very conversant with Erlang and I wasn’t getting anything useful from the highly “unique” Riak logging facilities, so I eventually stopped butting my head against it and ran the single-node tests instead. I’m a bit peeved, because the time I wasted on that breakage meant that I didn’t have enough left to test LightCloud as well and I really wanted to. Grrr. The results were still absymal, so that even with perfect scaling and no distribution overhead Riak would be unlikely to compare well even for its best case of small warm reads. There goes the “you should have compared apples to apples” excuse, because these are all apples in this basket. Riak proponents, beware. You’re on thin ice already.

At this point, I’m really liking Keyspace. It was one of the easiest to set up or to write an interface for, and performance was pretty good compared to the others. More importantly, it was pretty consistently good. Another factor here is resource usage. I was pretty disturbed to see Cassandra consume a few percent of my CPU and a tenth of my memory (on Rackspace, where “top” and friends give really weird but slightly informative results) just sitting there, before I’d even made a single connection. Cassandra CPU usage on the servers spiked very high during tests, while Voldemort CPU usage on the client did likewise. Keyspace usage was certainly noticeable, but moderate compared to both of the others. Well done, Keyspace team!

As I’ve been at pains to point out, these results are pretty lame. They’re just the first step on a long road toward a seriously useful qualitative comparison across alternatives in this fairly new space. I invite others to build on and improve these results because, frankly, I’ve run out of time and this has already become too much of a distraction from other things I should be doing. Maybe with some teamwork we can make some comparisons – not just of performance but of persistence/consistency guarantees which definitely vary and even more definitely matter – that an application writer can use to predict which alternative will work best for their particular situation.

What Hole is Amazon RDS Filling?

As I’m sure many of you know, Amazon announced its Reliable Database Service today. Frankly, it seems strange to me. After all these years aiding the adoption of alternatives to the stale old RDBMS, why change course now? The obvious explanation is that demand for RDBMSes still persists. People are still – often against all reason, considering the availability of alternatives – running them inside their EC2 instances. This is Amazon’s way of monetizing that demand, and along the way making things a little easier/nicer for users. Kudos to them. There’s nothing wrong with satisfying the customers, I always say. On the other hand, though, I wonder if there might be another little secret lurking in there. People running MySQL (or similar) inside their EC2 instances probably generates quite a bit of load on the network, and on EBS particularly. Could this be a way to shift that load a bit? Even if it doesn’t actually reduce total bandwidth consumption, separating the database load from everything else might allow for more predictable behavior and simplify provisioning. There’s nothing wrong with encouraging people to use resources more efficiently, either.

Either way, this is a welcome addition to the AWS arsenal. I don’t think it makes much of a statement about how people should design applications for the cloud (WV oh-so-diplomatically seems to discourage it without actually saying it’s wrong) but maybe it says something about how people are actually using the cloud to deploy existing applications with little or no change. Anything we can do to encourage cloud adoption is good, I guess, even if we’ll eventually have to wean them away from technologies that are no longer appropriate.

Comparing Key/Value Stores

Over the last few days, I’ve been using some of my spare time to benchmark several of the key/value stores that are out there. I don’t think all that much of limited-to-memory stores, because I think a store designed to be persistent can do everything a memory-based store can do by the simple expedient of running it on top of a RAM disk or filesystem. In a world where horizontal scaling is the name of the game, the argument that a memory-based store can avoid syscall overhead or some such is much less compelling than the argument that the human resource cost of deploying a whole separate infrastructure outweighs the material resource cost of deploying a few more servers. Whether you agree or disagree with that reasoning, what I set out to compare was a set of persistent stores. They varied in lots of other ways, though. Here are the competitors and how they distinguish themselves:

  • Tokyo Tyrant is a single-node data store which would require an extra layer to make it into the kind of multi-node data store I think most people really want/need (more about that later). It also has a very simple data model and practically no security model, but everyone says it’s fast so it’s a good baseline.
  • Chunkd is the lower layer of Project Hail, and is mostly equivalent to Tyrant except that it has a notion of multiple users and channel security (which was turned off for benchmarks).
  • Voldemort differs from Tyrant along the other axis – it still has a simple data model and no security features to speak of, though it does span nodes and supports multiple stores served through a single port.
  • Cassandra and Riak add richer (but different) data models to Voldemort’s functionality, but testing was for a very simple “one key maps to one value” model.
  • Tabled is the upper layer of Project Hail, and is more of an S3-equivalent large object store with full multi-user access control etc. I figured it would perform worse than the others (almost but not quite true) but I was curious to see how much of a penalty was associated with the extra functionality.

To test performance, I wrote a Python script which could speak to the interfaces for any of these stores. I had to create such an interface for chunkd using the ctypes module, but that was no big deal. I then ran tests as follows:

  • Use two virtual machines on my desktop. I’ll re-compare with EC2 instances, including multiple servers where appropriate, soon.
  • Run ten client threads on one machine, vs. all the server parts on the other, for a minute.
  • Test write, warm read, and cold read (i.e. daemon killed, all caches dropped, daemon restarted).
  • Test 100-byte and 1000-byte values. I probably should have tested (in future will test) 10K or larger, but for now I was interested in two small sizes.

Here are the performance highlights. I’ll post full numbers after I re-run the tests on EC2 as I mentioned; all I’ll say for now is that the per-second rates for all tests ranged from low hundreds to low thousands.

  • Most of the candidates were two to four times faster for warm reads than for writes, except for Tyrant which was only about 10% faster.
  • Half of the candidates (Chunkd, Cassandra, Riak) were two to three times as fast for warm reads as for cold. The other half (Tyrant, Voldemort, Tabled) were only 0-20% faster.
  • Tyrant was the clear winner, more than 20% faster than its closest competitor even for the warm-read cases and showing very little drop-off for either writes or cold reads – making it up to four times as fast in some tests.
  • Chunkd almost kept up with Tyrant in the warm-read case, but fell far behind otherwise. It still took second place in every test, though.
  • After that, things slowed down pretty consistently. Cassandra was only 30-85% as fast as chunkd, Voldemort only 30-80% as fast as Cassandra (except for the larger cold-read test where it actually pulled ahead by 25%), and tabled only 50-90% as fast as Voldemort. Trailing the pack by a considerable margin was Riak, which was only 15-45% as fast as tabled.
  • By the time all was said and done, Tyrant was 8-24 times as fast as Riak.

The fairly obvious conclusion is that the performance differences between alternatives in this space are definitely big enough to be worth thinking about. A top-to-bottom ratio of 2:1 or 3:1 might be made up with tuning or by modifying applications to take advantage of each store’s special features, but at 24:1 that starts to seem unlikely. Like Leonard Lin, I find Tyrant’s performance lead compelling enough to think that a consistent-hashing layer on top of Tyrant might kick ass. Like Leonard, I’m not convinced that LightCloud really delivers on that promise – though it’s interesting enough that I’ll probably include it in my next round of comparisons.

As I said, I still need to re-test in a real cloud environment to validate these results, plus I’ll test more sizes. I’ll probably also drop tabled and Riak in favor of LightCloud and maybe Keyspace. I’ll definitely publish the benchmark program and any ancillary pieces (such as my Python interface for chunkd), and go into more detail about exactly how everything was set up. I’ve found way too little information in this area that’s even empirical, let alone quantitative, so maybe I can get something started here to fill the void.

Patenting Cloud Storage

Apparently, a fellow by the name of Mitchell Prust has sued Apple and SoftLayer for infringement of patents in the area of online storage.

Prust’s patents discuss basic storage principles commonly associated today with Cloud Storage – with a plurality of storage servers and storage systems providing customers with access to ‘virtual storage areas’ for remote data file storage. Digging a little deeper into the patents, Prust also claims his invention extends beyond data storage and into the realm of remote processing – in which case the servers and storage within a ‘virtual storage area’ can be used to process client data stored within them to run complementary tasks such as compression, encryption/decryption, and data conversion. In the patents, Prust highlights a number of different means for moving data to a ‘virtual storage area’ including WebDAV, HTTP, SMB (Server Message Block) and FTP.

On the face of it, these patents are pretty broad. However, one thing I’ve learned is that the only really important thing about a patent is the claims – not the description, and certainly not the abstract – so I went to look at those. The three patents involved are 6714968, 6735623, and 6952724. The earliest, 968, seems the most general; 623 is essentially the same with a specific mention of an email interface, and 724 is essentially the same with a specific mention of WebDAV.

The first thing I notice is that all three (plus 20050278422 which has been filed but not issued) mention a “user assignable virtual storage area” in the all-important first claim. This is within my own technology area, and I still wouldn’t want to risk a ton of legal fees over a particular definition or distinction about what that does or does not cover. Does “assigned” refer to a permanent or transient kind of allocation? Is a directory a “virtual storage area” or does that term only apply to something more like a disk partition? Another element common to all three patents is the use of multiple interfaces to the networked storage, where one is fully integrated with the operating system (clearly meaning something like the UNIX VFS or Windows IFS layer) and the other is not.

It seems to me that making the “mkdir” command available to users would satisfy some definitions of “user assignable virtual storage area” while any system providing both NFS and FTP interfaces would match the multiple-interface criterion as well. Such systems have clearly existed since long before this patent was filed. Even with more restrictive definitions, the two necessary features were already so common that their combination was obvious enough to be unpatentable. That’s probably why the suit was filed in Patent Troll Central, and why the plaintiffs chose only two targets. They have to walk a thin line, between claims so narrow that they don’t apply to the case and claims so broad that the prior art comes pouring in. Some might say that the two ranges overlap, leaving no space in between, but Prust is obviously hoping that a sympathetic East Texas judge will see it differently.

Overall, even in that venue, I wouldn’t give the case much of a chance. The claims are not only broad but vague as well, making them particularly poor examples of the IP-lawyer’s black art. Don’t quit your day job, Mitch.

Consistent Hashing

Consistent hashing is one of those techniques that has practically become standard in many high-scale environments, and yet somehow remains little known in the broader community. To underscore the first point, one could point to Facebook/Twitter/Digg using Cassandra, or to many memcached deployments using Ketama enhancements to memcached. To the second point, I’ve more than once mentioned consistent hashing to architect-level folks who, despite being generally informed in many areas, indicated that they’d never even heard the term before (let alone appreciated its significance). Weird. Anyway, one of the best descriptions I know of is by Tom Kleinpeter. Instead of trying to repeat what Tom says, I’ll pick up where he leaves off.

The first additional point I’d like to make is that the circle method is not really the only way to do consistent hashing. All you really need is some way to map both keys and servers into some space, and define some metric of distance within that space. It just happens that the circular method has many advantages, such as making it easy to find the closest server to a particular key using simple range checks and making it easy to find the second closest (third closest, etc.) server to provide redundancy. On the other hand, other methods such as the XOR metric used by Kademlia have properties that are desirable in a DHT. As always, the precise tool must be matched to the job at hand, but for the common case of a LAN-oriented cache or key/value store I think circles are a good choice.

Next, I’d like to point out that the ability to have a single node show up at multiple points in a ring is incredibly powerful. Tom presents this as mostly a load-balancing trick, but its real strength lies in what happens when a node fails. If each node appears only once on the ring, then when it fails its immediate predecessor assumes 100% of its load. However, if it appears four times on the ring, then when it fails its load will be distributed across four predecessors. There’s a catch, though. It’s easy to fall into the trap of having a node appear N times by giving it N keys equally spaced around the ring. What happens to a node X using a common value of N? Each of its N keys is likely to be immediately preceded by a key for the same node Y, which will then end up absorbing 100% of the load if X should fail. This is easily avoided, e.g. by using X’s “main” key as a seed to a pseudo-random number sequence which is then used to vary the placement of its remaining N-1 keys, but it’s still a problem evident in many consistent hashing implementations.

Lastly, I’ll add another warning: consistent hashing doesn’t account for correlated failures. Like any randomized resource assignment, there’s a certain probability that both the closest and the second-closest server keys for a particular data key will be in the same failure domain – e.g. nodes in the same rack. More precisely, this will happen with probability 1/rack_count. You could try to manipulate key assignment to avoid this, but it’s likely to be a mess especially if you’re also trying to avoid the “failed node’s load all goes to the same backup” problem discussed in the last paragraph. If you’re the sort of person who actually cares about making sure replication occurs to a different rack, you’re probably better off with a two-ring solution – one ring to determine rack assignment and one to determine server-within-rack assignment – instead of trying to “trick” a single-ring setup into doing the right thing.

Scaling Beyond Caches

Caching is a neat trick. It’s a really really neat trick, using one level of the storage hierarchy to hide or work around slowness in the next, and we wouldn’t be able to do much of what we do without caches. They’re still a trick, though, because the illusion that they create always fades eventually. Inconsistencies start to creep in, or the cost of maintaining consistency starts to outweigh the advantage of having caches in the first place. Even if you manage to avoid both of those problems, request rates or problem sizes will practically always increase faster than cache sizes until those caches are too small and you’d better have another solution ready. HPC folks have known about this for a long time. They don’t think much of CPU caches, because the kind of stuff they do blows through those in no time. Increasingly, their attitude toward in-memory caching of disk data is the same, because they tend to blow through those caches too. After a few rounds of this, it’s natural to stop thinking of caches as performance magic and start thinking of them as just another tool in the box.

Some folks seem to have trouble learning that lesson. Even if they’ve learned it about computational tasks, they still cling to a belief that caches can last forever when it comes to I/O. Most recently, I’ve seen this in the form of people claiming that modern applications don’t need fast I/O because they can just keep everything in (distributed) memory. The H-Store papers behind some of this advocacy make some good points regarding OLTP specifically, but OLTP is only one kind of computing. While OLTP workloads are still extremely common and important, they account for an ever-decreasing share of storage capacity and bandwidth. Let’s run some numbers to see how inapplicable those conclusions are to most people. A modern server such as the IBM x3650 M2 can accommodate 128GB in 2U. That’s barely 1TB per rack after you account for the fact that some of the space has to be used for a top-of-rack switch to provide bandwidth to that data, and that you have to replicate the in-memory data for some semblance of resistance to faults (though many would say even that is far from sufficient for serious work). The SiCortex systems were built for high density, and even they only got to about 8TB in something less than 3x the physical space. Those are piddling numbers for storage, when installations with petabytes are no longer uncommon. It’s also a very costly way to get that much capacity, paying for RAM and the processors that go with it and the power to run both. It better be worth it. Is it? Only if some small and easily identified part of your data is red-hot, and the rest is all ice-cold, and your operations on that data distribute nice and evenly so you don’t have communication hot spots that would make memory vs. disk irrelevant. That’s a pretty limited set of conditions.

Still not convinced? OK, look at it this way. A lot of people have workloads more like Facebook than TPC-C, and it just so happens that there’s some information out there about how well caching has worked for Facebook. According to James Hamilton, as of last April Facebook was serving 475K images per second out of 6.5B total. Those numbers are certainly higher today – I’ve heard 600K and 15B – but they’re the basis for some other interesting numbers so let’s stick with them. Using 1TB of memory cache across 40 servers they get a 92% hit rate. That’s still multiple gigabytes per second that have to be served from disk – just for pictures, a year ago. For everything, today, they surely need dozens of servers to achieve the necessary bandwidth, and dozens of disks per server to achieve the necessary capacity. Facebook is far from typical, but also far from the “in-memory everything” crowd’s model and moving further away every day. Where Facebook is leading, many others will follow.

Sooner or later, your caches (including memory used as cache rather than as a functionally different operational resource) won’t be big enough to provide adequate performance if the next level in your storage hierarchy is too slow. Caches can still be useful as a way to optimize use of that next level, such as by avoiding redundant requests or turning random small-block I/O into large-block sequential, but they can’t hide its weaknesses. Parallelism and horizontal distribution are the real long-term keys to scalability, for I/O as well as for computation. If you really want scalability, you have to put parallelism at the center of your design with caches in a supporting role.

Book Review: Cloud Application Architectures

You won’t learn anything about cloud application architectures from this book. Author George Reese never even mentions (deep breath) memcached or other key/value stores, non-SQL/non-ACID databases, map/reduce, SOA, messaging services, ORMs, Terracotta/EhCache (or anything else in the Java or Ruby ecosystems), cloud storage, etc. He devotes a whole two sentences to sharding – one to mischaracterize it as a fault-containment (not performance) technique, and another to dismiss it out of hand. He discusses the problem of lock-based concurrency not scaling, but never mentions alternatives such as actor model or STM; his “solution” is to stick everything in MySQL and let it manage concurrency. These are all startling voids in what an actual cloud application architect – let alone an author on the subject – should know.

So, what will you find in the book? There’s some pretty decent information about security, capacity planning, and disaster recovery. None of that is very specific to cloud, of course, and it tends to be operational rather than architectural knowledge, but he does a decent job. There’s also some good information about running applications on EC2, including deployment and management of your own AMIs. You’ll also find some strong but unsubstantiated (and mostly inaccurate) claims about Amazon EC2 reliability, virtualized I/O performance, and other subjects.

Lastly, you’ll find ads. The appendices written by Rackspace and GoGrid, and attributed as such, I find minimally useful but not particularly annoying. What really annoyed me was the blatant ad copy for Cleversafe, inserted without attribution into the text. There’s little evidence elsewhere in the text that Reese ever used Cleversafe or understands its technical underpinnings. Its essential nature is quite contrary to the DB-centric cargo cultism evident elsewhere in the book (which inspired my post on the subject). It’s totally out of place and inappropriate.

Perhaps I’d be less critical (except for the advertising part) if the title said Cloud Application Deployment. If you’re in IT and want to know how to take an old-school DB-centric low-scale application and deploy it in the cloud, you’ll find the book quite useful. If you’re architect and want to know how to develop a high-scale cloud application, it might be worse than useless. It might actually misinform and mislead you, leading to an application that doesn’t work.

UPDATE: I still haven’t found any good books on this subject, but at least Amazon has a couple of good articles.

The Sidekick Disaster

By now, many of my readers have heard about the diaster involving Sidekick phones. Since I’m now seen as a cloud evangelist and this was a cloud service, I suppose I should comment. Some people are already using this incident to “prove” that you can’t trust cloud storage, which is silly. Others are trying to avoid the criticism by claiming it wasn’t a cloud, which I find even siller. Yes, it was a cloud. No, the failure wasn’t because it was a cloud. It wasn’t even a failure of outsourcing, even though apparently T-Mobile had outsourced operations to Microsoft who outsourced a SAN upgrade to Hitachi (cue “spontaneous” jeering from all of EMC’s “independent” bloggers). The data loss occurred because the SAN upgrade was started without an adequate backup, which is a failure of basic IT competence. Such failures could happen just as easily inside an entirely private data center, and often have. The real lessons here are pretty old ones:

  • When it comes to storing your data, redundancy and diversity still matter. You should never have only one copy of the data, under one entity’s control and vulnerable to that one entity screwing up.
  • Just because you’ve delegated some functionality somewhere, you’re not absolved of responsibility for planning around failures.

Cloud computing and storage can be used to replace a large part of your traditional data center. They cannot replace a data-protection strategy. No vendor – product or service, cloud or otherwise – can do that for you. You still have to design and execute that strategy yourself, and oversee anyone involved in that execution. Where T-Mobile/Microsoft failed was in not paying attention to assets under their control. It was a failure of duty, not of technology, but we can still learn from it.