Amazon’s Outage

Apparently the AWS data center in Virginia had some problems today, which caused a bunch of sites to become unavailable. It was rather amusing to see which of the sites I visit are actually in EC2. It was considerably less amusing to see all of the people afraid that cloud computing will make their skills obsolete, taking the opportunity to drum up FUD about AWS specifically and cloud computing in general. Look, people: it was one cloud provider on one day. It says nothing about cloud computing generally, and AWS still has a pretty decent availability record (performance is another matter). Failures occur in traditional data centers too, whether outsourced or run by in-house staff. Whether you’re in the cloud or not, you should always “own your availability” and plan for failure of any resource on which you depend. Sites like Netflix that did this in AWS, by setting up their systems in multiple availability zones, were able to ride out the problems just fine. The problem was not the cloud; it was people being lazy and expecting the cloud to do their jobs for them in ways that the people providing the cloud never promised. Anybody who has never been involved in running a data center with availability at least as good as Amazon’s, but who has nevertheless used this as an excuse to tell people they should get out of the cloud, is just an ignorant jerk.

The other interesting thing about the outage is Amazon’s explanation.

8:54 AM PDT We’d like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones

I find this interesting because of what it implies about how EBS does this re-mirroring. How does a network event trigger an amount of re-mirroring (apparently still in progress as I write this) so far in excess of the traffic during the event? The only explanation that occurs to me, as someone who designs similar systems, is that the software somehow got into a state where it didn’t know what parts of each volume needed to be re-mirrored and just fell back to re-mirroring the whole thing. Repeat for thousands of volumes and you get exactly the kind of load they seem to be talking about. Ouch. I’ll bet somebody at Amazon is thinking really hard about why they didn’t have enough space to keep sufficient journals or dirty bitmaps or whatever it is that they use to re-sync properly, or why they aren’t using Merkle trees or some such to make even the fallback more efficient. They might also be wondering why the re-mirroring isn’t subject to flow control precisely so that it won’t impede ongoing access so severely.

Without being able to look “under the covers” I can’t say for sure what the problem is, but it certainly seems that something in that subsystem wasn’t responding to failure the way it should. Since many of the likely-seeming failure scenarios (“split brain” anyone?) involve a potential for data loss as well as service disruption, if I were a serious AWS customer I’d be planning how to verify the integrity of all my EBS volumes as soon as the network problems allow it.

Using bottle.py with SSL

The other day, I needed to implement a very simple remote service for something at work. Everything I was doing seemed to map well onto simple HTTP requests, and I had played with bottle.py a while ago, so it seemed like a good chance to refresh that bit of my memory. Not too much later, I was adding @route decorators to some of my existing functions, and voila! The previously local code had become remotely accessible, almost like magic. That was cool and allowed me to finish what I was doing, but at some point I’ll need to make this – and some other things like it – more secure. So it was that I sat down and tried to figure out how to make this code do SSL. At first I thought this must be extremely well-trodden ground, covered in just about every relevant manual and tutorial, but apparently not. Weird. In any case, here’s what I came up with for a server and a test program.

What I’m mostly interested in here is authenticating the client, but I do both sides because doing only one side seems a bit rude. “You have to show me ID before you can talk to me, but I don’t have to show anything to you.” It kind of annoys me how most people obsess over authenticating servers while allowing clients to remain anonymous, so I’m not going to do the opposite. The key in any case is to wrap WSGIServer.server_activate, which seems to be the last thing that gets called before accept(), so that it can call ssl.wrap_socket with all of the appropriate configuration data. Then, if you want to authenticate clients, you need to wrap WSGIRequestHandler.handle and actually check the incoming client certificate there. Finally, both of these get wrapped up together in an adapter class for bottle.run to use. Clear as mud, huh?

That’s really all there is. It’s no stunning work of genius, that’s for sure, but maybe the next guy searching for this should-be-well-known recipe will be able to save some time.

Target as a Service

One of the dangers of deploying in a public cloud is that Evil People are very much attracted to public clouds. This is for two main reasons:

  • Public clouds are a target-rich environment for evil-doers. The IP ranges tend to be densely packed with servers, each of which might contain a large trove of easily-exploitable information. Contrast this with a public connectivity provider such as Verizon or Comcast, where any given IP address is much more likely to be either unused or assigned to an individual PC containing only a few pieces of exploitable information.
  • Most public clouds’ network virtualization and firewall setups make it hard to do proper intrusion detection, and providers’ own intrusion detection is likely to be of little use. Heck, you can’t even rely on most of them doing any intrusion detection since they won’t tell you.

This isn’t just random paranoia; it’s actual experience. I’ve been running my own little server in the Rackspace cloud for a little while. Here’s a little tally of failed ssh login attempts for that one machine over a mere week and a half.

382 195.149.118.43
136 61.83.228.112
103 89.233.173.91
60 195.149.118.43
54 210.51.47.177
53 74.205.222.27
37 200.27.127.95
36 74.205.222.27
32 61.83.228.112
20 74.126.30.189
12 222.122.161.197
10 74.126.30.189
8 210.51.47.177
6 202.85.216.252
2 78.110.170.108
2 222.122.161.197

That’s four attempts per hour, for a site that nobody really has any reason to know exists. Of course, they’re not evenly spaced. The most recent attack came at one attempt per two seconds. The attacks are also coming from all over; the top three address above are from Poland, Korea, and Germany respectively. It’s also worth looking at what accounts people are trying to break in to.

532 root
18 ts
16 admin
15 postgres
14 test
14 oracle
14 nagios
12 mysql
11 shoutcast

Note that this is just one (particularly blatant) kind of attack, on one unremarkable machine, over a short time period. Imagine what the numbers must be for all attacks across a whole farm of machines, especially if early probes had shown encouraging signs of being weakly protected. I’m sure Rackspace makes some efforts to defeat or at least detect intrusion attempts, but how much can it be? If you were a cloud operator, what would you think of the following pattern?

  • Many ssh connections from the same external host, previously unknown to be associated with the internal one.
  • All connections spaced exactly two seconds apart.
  • Each connection made and then abandoned with practically zero data transfer (not even enough for a login prompt).
  • The same pattern repeated for other internal hosts belonging to different customers, either simultaneously or in quick succession.

That seems like one of the most glaringly obvious intrusion signatures I can think of, worthy of notifying someone. For all I know Rackspace does detect such patterns, and either cuts off or throttles the offending IP address, but there seems to be little sign of that. This is not to pick on Rackspace, either. I picked them for a reason, and I’ll bet the vast majority of other providers are even less secure. The real point is that if you’re in the cloud, even if it’s a good cloud, you need to be extra careful not to leave ports or accounts easily accessible to the sorts of folks who are aggressively probing your provider’s address space looking for such open doors.

Vacation Email

Maybe this feature already exists, and I never saw it mentioned anywhere. When somebody goes on vacation, they often set a vacation auto-reply to let people know that their message won’t be read until the vacation ends. Some people think this is good etiquette; some think it’s bad. Some employers require it; others forbid it. Anyway, let’s say that you work in a place where a lot of your email is automatically generated, such as by bug trackers or source-control systems. It seems such a waste for my mailer to attempt a vacation auto-reply which will just go into a black hole somewhere. Wouldn’t it be better to prevent such an auto-reply from being generated? Something like this:

X-Allow-Auto-Reply: no

I know there are some headers, like List-Xxx, that can indicate mail is from a list and give a pretty strong hint that an auto-reply would be pointless, but that’s not quite the same thing. Not all auto-generated mail is generated by a list manager or destined for a list, for one thing. For another, I’m sure many people would like to have their mail program insert such a header into every email they send even though they’d be perfectly capable of receiving a reply. Surely I’m not the only person to have noticed this easy-to-eliminate waste, but there doesn’t seem to be any well publicized solution. Can anyone point me to one that might actually be implemented somewhere?

Scalable Flow Control

One of the problems that seems to occur again and again when “computing at scale” is some sort of server getting overwhelmed by requests from relatively much more numerous clients. Sooner or later, every kind of server has to deal with running out of buffers or request structures or (if the designers were fools) threads/processes, or just about any other kind of resource. Having the entire system grind to a halt because one server couldn’t handle overload gracefully and instead died a messy death is both unpleasant and unnecessary. It’s unnecessary because there are solutions that work. Unfortunately, there are other solutions that don’t work, but get tried anyway because they seem easier.

One such non-solution is to have lots and lots of servers, and spread the load between them as evenly as possible. Most people who have actually tried this have eventually realized that spreading the load that well is very hard if not impossible. Sooner or later, an access pattern appears that causes one server to get overloaded. Then it fails, increasing load on its peers (not just the shifted operational load but now the recovery load as well) and quite likely causing them to fail as well, and so on. If this sounds a bit like the northeast US power blackout in 2003, it should. Even if your load-balancing is really good and you’re committed to running your servers at 10% of capacity, a physical or configuration error could leave you with this sort of imbalance/failure cascade. The solution is to handle the condition, not avoid it, and that means some form of flow control. In other words, you have to make requests queue at the (more numerous) clients instead of within the servers.

Flow control can be implemented in many ways. It can be implemented at a low level to maximize generality and code reuse or at a high level to maximize efficiency and applicability to all kinds of resources. It can be implemented via credits that clients must hold or obtain before sending a request, or via “slow down” messages that are sent from servers to clients only when needed. The preachers of statelessness would say that the latter approach is less stateful and therefore preferable, but I think they’re mostly deluding themselves. For one thing, a lot of “stateless” servers have really just moved their session-layer state somewhere else (e.g. a database maintained by the application layer) instead of truly eliminating it. For another, the state after receiving a “slow down” message is still state that must be maintained. If clients can simply ignore such a message, or “forget” that they got it, you’ve achieved nothing whatsoever. If they’re bound to respect it, and especially if the server attempts to enforce it, then you’re just as stateful as you would be with full credit accounting but limited to only zero or one credit.

So, if you’re going to use a credit approach, how should credit be allocated? Again, many will be tempted to use non-scalable approaches. Often the easiest thing to do is to allocate a worst-case set of resources (and associated credit) to every client. That can work OK with few clients, but rapidly leads to unacceptable levels of resources being allocated but idle when the node counts stretch into the hundreds or thousands. The opposite end of the spectrum is to require that all resources and associated credit be explicitly obtained, but this can lead to unacceptable first-request latency and performance anomalies as each batch of credit is consumed. In my experience, a hybrid approach works better: preallocate just enough resources and credit for each client to keep it happy while it explicitly requests and obtains more from a common pool. The common pool can then be large enough to satisfy the maximum worst-case load for the entire system, which is often much less than the sum of the worst-case numbers for each node, allowing support for many more clients at the same resource level. Also, the allocation requests and replies can often be piggy-backed on other messages so they carry little additional cost.

The one remaining problem is how credit gets returned to the common pool when a client no longer needs it. This can be driven either by the client (when it recognizes that it no longer needs the resources) or by the server (when it needs to replenish the common pool). Since it’s generally hard to tell when a client doesn’t need credit any more, the client-driven approach usually involves giving up credit after a timeout. The server-driven approach, on the other hand, requires implementation of a credit-revocation exchange parallel to the credit-granting one. It’s even possible to combine approaches, and in fact I usually do so that a client might give up credit either on its own initiative or in response to a server message while reusing most of the same code for both cases.

With this kind of scheme – small amounts of per-client credit plus explicit requests and revocations of any credit over that amount, with credit-level changes potentially driven by either side – it’s possible to avoid server overload without either starving clients or wasting server resources. It’s not really as complicated as it might sound, and can be implemented with a negligible impact on common-case performance.

Cloud Computing and Teenage Sex

Don’t blame me for the comparison. It’s actually Walter Pinson‘s.

It was once said back in the early ‘90s that “Client/server computing is a little like teenage sex – everyone talks about it, few actually do it, and even fewer do it right. Nevertheless, many people believe client/server computing is the next major step in the evolution of corporate information systems.”

Can the same be said about cloud computing, today?

I contend that cloud computing is like teenage sex in another way: teenagers act like they invented sex, annoying their elders who thought that they invented it back when they themselves were teenagers. As Pinson’s reference to client/server computing makes clear, there’s a lot about cloud computing that’s not new. There are even aspects that go back even further. When people talk about how to bill for cloud computing, or how to insulate users from one another, it all starts to sound a lot like the old time-sharing days. It’s time-sharing on a new kind of system, but it’s time-sharing nonetheless.

There are people creating new technology in the cloud computing space, to be sure. (This is where the teenage-sex analogy breaks down.) I used to be one of them, and might be again in the not-too-distant future. There are far more people merely reinventing old technology in the cloud computing space. If anyone really wants to understand cloud technology and how it might best be deployed to create value, I think it’s important to understand which parts are actually new and how they’re new vs. what parts have already been done or tried.

P.S. While we’re talking about cloud analogies, Bruce Sterling had another good one.

Okay, “webs” are not “platforms.” I know you’re used to that idea after five years, but consider taking the word “web” out, and using the newer sexy term, “cloud.” “The cloud as platform.” That is insanely great. Right? You can’t build a “platform” on a “cloud!” That is a wildly mixed metaphor! A cloud is insubstantial, while a platform is a solid foundation! The platform falls through the cloud and is smashed to earth like a plummeting stock price!

There’s a lot of other randomness in there too, but the fiction author’s comparison of cloud-computing fiction to financial-market fiction is worth thinking through.

FUD at 10 Gigabits Per Second

At my last job, I had to work with InfiniBand. Believe me, this did not lead to an enduring love of IB. Before InfiniBand, I had worked with Fibre Channel and seen how overburdened it was with every vendor’s favorite feature or format or protocol variation, often with some little bit hidden somewhere to tell you which of several possible (and mutually incompatible) behaviors you were expected to exhibit in response. Compared to IB, FC is a model of streamlined simplicity. How’s that for scary? Nonetheless, now that all those thousands of person-hours have been poured into it, IB does actually manage to deliver somewhat on its original promise of high bandwidth and low latency at low cost.

So along comes 10-gigabit Ethernet (10GbE), which is so many levels removed even from the thing that people called Ethernet after original Ethernet had been dead and buried that nothing remains but the brand name. It seems that some folks are sure it’s going to displace IB as a cluster interconnect Any Day Now. Hitching themselves to that belief, they’ve started flinging FUD about IB’s “misleading” bandwidth numbers. Here’s one of the more egregious examples.

We will tear this black cable bandit down to size one claim at a time. First they assert that it’s 20Gbps, how about 12Gbps on it’s best day with all the electrons flowing in the same direction. Infiniband employs what is know as 8b/10b encoding to put the bits on the wire. For every 10 signal bits there are 8 useful data bits. Ethernet uses the same method, the difference is that Ethernet for the past 30 years has advertised the actual data rate while Infiniband promotes the 25% larger and useless signal rate. Using Infiniband math Ethernet would then be 12.5Gbps instead of the 10Gbps it actually is. So using Ethernet math Infiniband’s Double Data Rate (DDR) is actually only 16Gbps and not the 20Gbps they claim.

Apparently, according to “10GbE math” 16Gb/s is less than 10Gb/s. Spare me. DDR IB is at approximate price parity with 10GbE, and still 60% faster than 10GbE – with QDR products already available. How does that make 10GbE the superior choice, again? Wait, you say. Those are only nominal bandwidths, right? True enough, and just as true for 10GbE as for IB. It would be a little disingenuous to point out that IB doesn’t really achieve 16Gb/s except “on it’s best day with all the electrons flowing in the same direction” without also pointing out that 10GbE is subject to the same effects (and the vast majority of cards according to 10GbE.net’s own price lists aren’t even physically capable of more than 13Gb/s across two ports).

The writing style on 10GbE.net is strikingly similar to that of a certain Cisco employee. Instead of launching all this FUD from behind a screen of anonymity, would it be too much to ask that the author show a little more honest about his associations? When he can show repeatable, verifiable results indicating that DDR IB doesn’t still trounce 10GbE at the same price point, then we can have a real discussion about cluster interconnects.

BitTorrent over UDP

The hot topic in the blogosphere right now is about BitTorrent Inc. switching to a UDP-based protocol for their popular uTorrent client. Most of the links on this subject ultimately lead back to Bittorrent declares war on VoIP, gamers at El Reg. Having talked to Richard Bennett before, I’m well aware of his penchant for saying outrageous things to get attention, but most of his critics are behaving even worse. For example, here’s Janko Roettgers.

Bennet’s piece is based on a belief that UDP traffic is “aggressive” and uncontrollable, whereas TCP is the nice and proper protocol that can be easily managed. This notion ignores the basic fact that P2P developers, in order to make the protocol work at all, need to implement TCP-like functionalities on top of UDP, one of which includes congestion control. You simply can’t operate a P2P client that eats up all of its users’ bandwidth, much less build a successful business model on top of it.

That’s an unfortunately, not to say selfishly, client-centric perspective. For one thing, eating up all of the user’s own bandwidth is not the issue; crowding out other users’ traffic is. For another, Roettgers completely ignores what happens to packets in between two PCs on the internet. The fact is that all those routers have mechanisms in place to do congestion control, and – while there are proposals for TCP-friendly rate control out there – many people have traditionally turned to UDP for the specific purpose of bypassing congestion control. As George Ou pointed out a while ago, P2P applications tend to “game the system” in effect and probably by intent. I know from my own conversations with Bram Cohen that he has never liked TCP, so it doesn’t really seem all that unlikely that the UDP switch is very intentionally to avoid congestion control – despite the inevitable “collateral damage” to non-BitTorrent users, and therefore quite antisocially. Bennett offers one further reason to believe this.

Upset about Bell Canada’s system for allocating bandwidth fairly among internet users, the developers of the uTorrent P2P application have decided to make the UDP protocol the default transport protocol for file transfers.

I don’t know where Bennett got the Bell Canada angle, and considering the source I’d be interested in any pointers to relevant statements by Cohen or anyone else at BitTorrent, but it certainly seems plausible. What I’d be most interested in, though, is any credible alternative explanation. “It will make BitTorrent downloads faster without affecting anyone else” is just another way of saying there is such a thing as a free lunch. The claims about BitTorrent’s uTP doing congestion control better than TCP are suspect too, since details on uTP don’t seem to be available for us to review and the one detail I’ve seen mentioned – using latency to detect congestion – doesn’t exactly represent the current state of knowledge among people who really understand congestion control. Can anyone explain, without hand-waves and exaggeration, why uTorrent is switching to UDP other than to consume a greater share of bandwidth than existing mechanisms to preserve robustness and fairness in the internet would allow?

Free But Broken

No, this isn’t about open source, though frequent readers could be excused for thinking it would be. Denver International Airport’s free wi-fi is the biggest piece of junk ever. First they make you sit through an ad before you can use it. Then they wrap everything with their own stupid frame, in the process breaking things like Google Reader even if you Ad-Block it. Then they do allow ssh, but it (and I’ll guess anything encrypted so they can’t see inside) so slow that it’s practically unusable. Thanks a lot, jerks. I already have a subscription to Boingo, and if you hadn’t bought a monopoly I’d be using that productively. It’s too bad you don’t have anyone competent or ethical working for you.

Hype vs. FUD

One of the computing trends I’ve been watching for a while is so-called “cloud computing” which is really just the latest name for a kind of distributed computing that has been around in nearly identical form for just under a decade and in not-too-different forms for at least twice that long. Heck, I was working on one of the keystones of cloud computing – globally distributed storage – back in 2000, so when I react to the current hype wave with terms like “clown computing” it’s hardly because I’m afraid of something new.

In Cloud Computing is Scary – But the FUD Has to Stop, Dan Morrill complains about the FUD supposedly being directed at cloud computing. Well, Dan, with any new (or even supposedly new) technology there will be hype-mongers at one end and FUD-slingers at the other, with most of the computing community in between. Those with a vested interest in promoting a technology tend to tar everyone less enthusiastic than themselves with the “FUD” brush, just as those with a vested interest in suppressing it tend to tar everyone more enthusiastic with the “hype” brush. In this case the fact is that there are a lot of people who can’t even define cloud computing making all sorts of grandiose and often blatantly false claims. Saying that cloud computing is inherently flawed, that it can never work, would be FUD; pointing out that the claims being made by or about specific vendors and products remain unproven, or that there are still problems to be addressed, is just healthy skepticism. By portraying that skepticism as FUD, you only put yourself further on the “hype” end of the spectrum.

It is long past time to continue with the same old tired refrain of “no” and move on to where business is going.

It is time to start embracing where business is going, and trying to make sure that they are doing it in the safest way possible.

Where business is going, eh? Got any evidence of that? No, of course not. That’s where you would like business to go, but that’s not at all the same thing. Such grandiose “this is the wave of the future so don’t miss it” rhetoric does very little to allay people’s legitimate concern that it’s really a wave of hype. You’d be better off presenting cloud computing as a still-to-be-embraced opportunity, not a fait accompli in the business world.

Business has taken to virtualization in a big way, which I think is misguided for a whole different set of reasons (I believe it’s better to build and deploy smaller servers which can be combined into larger complexes instead of larger ones which then have to be sliced up). There is some correspondence and synergy between virtualization and cloud computing, but I can’t recall any cloud computing proponents articulating that connection as a coherent and usable business strategy. Riding on virtualization’s coat-tails isn’t enough. Some day very soon, somebody in the cloud computing camp needs to do a better job of explaining their Grand Concept’s very own value proposition separate from virtualization.

There are very few information security experts in cloud computing.

What security professionals need to be doing rather than creating their own FUD is work out ways to make it safer.

There might be very few information-security experts in the inbred cabal of people who push the “cloud computing” brand in blogs and such, but there are plenty of people who have been working at the intersection of security and distributed computing for years. Do you think the people behind Amazon, or Allmydata.org Tahoe, or Mozy (across the street from the folks at RSA), or Iron Mountain, don’t have a few clues about this stuff? I know many of them, have worked with some, and I know you’d be dead wrong. Securing data across the net is a well-studied problem. So is securing computation across the net, though that’s not my own specialty. It doesn’t mean all the answers are known, but it’s just not true that such expertise is rare or rarely applied.

it is time for information security folks to step up to the plate and get smart on how the technology works.

The best bet right now for the security engineer is to work through the process, and get smart now so that management can benefit from what you have learned.

No, maybe it’s time for cloud computing folks to get smart on how security technology works. Don’t try to push the burden of fixing your problems onto another community, and especially don’t try to hint that they’re “not smart” as you do it. That’s no way to get the help you need. If you cloud computing folks are such great innovators, take some responsibility for learning what’s already out there and using it to innovate your own solutions. When you act as though you invented the greatest thing ever and everyone else needs to catch up, you come across just like teenagers who act like they invented music or sex and that’s just really annoying. Customers don’t like to deal with annoying vendors.

There’s nothing wrong with consciousness-raising but, especially in this economic environment, people are suspicious of evangelists whose promises are incommensurate with their ability to demonstrate real working product with real business value. If you don’t want to sour everyone on the whole idea of cloud computing or anything like it for the next ten years, dial down the marketing and dial up the technical progress.

UPDATE: The Onion says Recession-Plagued Nation Demands New Bubble To Invest In.

A panel of top business leaders testified before Congress about the worsening recession Monday, demanding the government provide Americans with a new irresponsible and largely illusory economic bubble in which to invest.

“Perhaps the new bubble could have something to do with watching movies on cell phones,” said investment banker Greg Carlisle of the New York firm Carlisle, Shaloe & Graves. “Or, say, medicine, or shipping. Or clouds.