Canned Platypus

Making the world better, one byte at a time.

Archive for the ‘tech’ Category

I love programming. I love thinking about algorithms and data strucures. I love writing code, rearranging code, talking about code. I even love testing and debugging and documenting code. (This is not to say I do all of these things as consistently as I can. There are still only 24 hours in a day and so one must prioritize.) Sometimes I think of getting out of this field, though, because so much of working as a programmer nowadays has nothing to do with any of the things I love, and it seems to be getting worse. Nobody loves meetings and bureaucracy and such, but that’s not what I’m talking about.

I hate spending half my time dealing with build systems, source-control systems, package managers, and such. There are too many out there, they all suck, everybody has their favorite one and their favorite way of using it, and they’re not at all shy about ramming their preferences down your throat . . . which brings me to my real point. I hate programmers. Hot damn, but we are a noxious breed, aren’t we? I’m tired of the backstabbing, the trashing each others’ work, the holier-than-thou attitude from the GNU types, the rampant sexism, the bike-shedding, the endless effort to do and re-do all the fun stuff while dumping as much work as possible onto one’s peers, and on and on and on. I know I’ve exemplified many of these sins myself, I don’t need anyone else to tell me that, but if I made it my life’s goal to be as much of a jerk as possible I’d still find myself outdone just about every day by people who aren’t even at their worst.

Of course, I don’t know what else I’d do that pays the bills, so you all are stuck with me, but come on, people. Let’s stop sucking all the fun out of this profession.

In my last post, I described several common data-loss scenarios and took people to task for what I feel is a very unbalanced view of the problem space. It would be entirely fair for someone to say that it would be even more constructive for me to explain some ways to avoid those problems, so here goes.

One of the most popular approaches to ensuring data protection is immutable and/or append-only files, using ideas that often go back to Seltzer et al’s log structured filesystem paper in 1993. One key justification for that seminal project was the observation that operating-system buffer/page caches absorb most reads, so the access pattern as it hits the filesystem is write-dominated and that’s the case for which the filesystem should be optimized. We’ll get back to that point in a moment. In such a log-oriented approach, writes are handled as simple appends to the latest in a series of logs. Usually, the size of a single log file is capped, and when one log file fills up another is started. When there are enough log files, old ones are combined or retired based on whether they contain updates that are still considered relevant – a process called compaction in several current projects, but also known by other names in other contexts. Reads are handled by searching through the accumulated logs for updates which overlap with what the user requested. Done naively, this could take linear time relative to the number of log entries present, so in practice the read path is often heavily optimized using Bloom filters and other techniques so it can actually be quite efficient. This leads me to a couple of tangential observations about how such solutions are neither as novel nor as complete as some of their more strident champions would have you believe.

  • The general outline described above is pretty much exactly what Steven LeBrun and I came up with in 2003/2004, to handle “timeline” data in Revivio’s continuous data protection system. This predates the publication of details about Dynamo in 2007, and therefore all of Dynamo’s currently-popular descendants as well.
  • Some people seem to act as though immutable files are always and everywhere superior to update-in-place solutions (including soft updates or COW), apparently unaware that they’re just making the complexity of update-in-place Somebody Else’s Problem. When you’re creating and deleting all those immutable files within a finite pool of mutable disk blocks, somebody else – i.e. the filesystem – has to handle all of the space reclamation/reuse issues for you, and they do so with update-in-place.

Despite those caveats, the log-oriented approach can be totally awesome and designers should generally consider it first especially when lookups are by a single key in a flat namespace. You could theoretically handle multiple keys by creating separate sets of Bloom filters etc. for each key, but that can quickly become unwieldy. It also makes writes less efficient, and – as noted previously – write efficiency is one of the key justifications for this approach in the first place. At some point, or for some situations, a different solution might be called for.

The other common approach to data protection is copy on write or COW (as represented by WAFL, ZFS, or btrfs) or its close cousin soft updates. In these approaches, blocks are updated in place, but with very careful attention paid to where and/or when individual block updates actually hit disk. Most commonly, all blocks are either explicitly or implicitly related as parts of a tree. Updates occur from leaves to root, copying old blocks into newly allocated space and then modifying the new copies. Ultimately all of this new space is spliced into the filesystem with an atomic update at the root – the superblock in a filesystem. It’s contention either at the root or on the way up to it that accounts for much of the complexity in such systems, and for many of the differences between them. The soft-update approach diverges from this model by doing more updates in place instead of into newly allocated space, avoiding the issue of contention at the root but requiring even more careful attention to write ordering. Here are a few more notes.

  • When writes are into newly allocated space, and the allocator generally allocates seqential blocks, the at-disk access pattern can be strongly sequential just as with the more explicitly log-oriented approach.
  • The COW approach lends itself to very efficient snapshots, because each successive version of the superblock (or equivalent) represents a whole state of the filesystem at some point in time. Garbage collection becomes quite complicated as a result, but the complexity seems well worth it.
  • There’s a very important optimization that can be made sometimes when a write is wholly contained within a single already-allocated block. In this case, that one block can simply be updated in place and you can skip a lot of the toward-the-root rigamarole. I should apply this technique to VoldFS. Unfortunately, it doesn’t apply if you have to update mtime or if you’re at a level where “torn writes” (something I forgot to mention in my “how to lose data” post) are a concern.

It’s worth noting also that, especially in a distributed environment, these approaches can be combined. For example, VoldFS itself uses a COW approach but most of the actual or candidate data stores from which it allocates its blocks are themselves more log-oriented. As always it’s horses for courses, and different systems – or even different parts of the same system – might be best served by different approaches. That’s why I thought it was worth describing multiple alternatives and the tradeoffs between them.

As I mentioned in my last post, I’ve been getting increasingly annoyed at a lot of the flak that has been directed toward MongoDB over data-protection issues. I’m certainly no big fan of systems that treat memory as primary storage (with or without periodic flushes to disk) instead of a cache or buffer for the real thing. I’ve written enough here to back that up, but I’ve also written plenty about something that bugs me even more: FUD. Merely raising an issue isn’t FUD, but the volume and tone and repetition of the criticism are all totally out of proportion when there are so many other data-protection issues we should also worry about. Here are just a few ways to lose data.

  • Don’t provide full redundancy at all levels of your system. It’s amazing how many “distributed” systems out there aren’t really distributed at all, leaving users entirely vulnerable to loss or extended unreachability of a single node, without one peep of protest from the people who are so quick to point the finger at systems which can at least survive that most-common failure mode.
  • Be careless about non-battery-backed disk caches. If data gets stranded in a disk cache when the power goes out, it’s no different than if it was stranded in memory, and yet many projects do absolutely nothing to detect let alone correct for obvious problems in this area.
  • Be careless about data ordering in the kernel. My colleagues who work on local filesystems and pieces of the block-device subsystem in Linux (and others working on other OSes) have done a great deal of too-little-appreciated work to provide the very highest levels of data safety that they can without sacrificing any more performance than necessary. Then folks who preach the virtues of append-only files without knowing anything at all about how they work turn around and subvert all that effort by giving mount-command and fstab-line examples that explicitly put filesystems into async mode, turn off barriers, etc.
  • A special case of the previous point is when people actually do seem to know the options that assure data protection, but forego those options for the sake of getting better benchmark numbers. That’s simply dishonest. You can’t claim great performance and great data protection if users can only really get one or the other depending on which options they choose. Pick one, and shut up about the other.
  • Be careless about your own data ordering. A single I/O operation can require several block-level updates. Many overlapping operations can create a huge bucket of such updates, conflicting in complex ways and requiring very careful attention to the order in which the updates actually occur. If you screw it up just once, and it takes a special brand of arrogance to believe that could never happen to you, then you corrupt data. If you corrupt metadata, you might well lose the user data it points to. If you corrupt user data that can be even worse than losing it, because there are security implications as well. It’s not nice when some of your confidential data becomes part of somebody else’s file/document/whatever. At least with mmap-based approaches, it’s fairly straightforward to do things with msync and fork and hypervisor/filesystem/LVM snapshots to at least guarantee that the state on disk remains consistent even if it’s not absolutely current.
  • Don’t provide any reasonable way to take a backup, which would protect against the nightmare scenario where data is lost not because of a hardware failure but because of a bug or user error that makes your internal redundancy irrelevant.

Of course, some of these issues won’t apply to Your Favorite Data Store, e.g. if it doesn’t have a hierarchical data model or a concept of multiple users. Then again, the list is also incomplete because the real point I’m making is that there are plenty of data-protection pitfalls and plenty of people falling into them. Some of the loudest complainers already had to suspend their FUD campaign to deal with their own data-corruption fiasco. Others are vulnerable to having the same thing happen – I can tell by looking at their designs or code – but those particular chickens haven’t come home to roost yet.

Look, I laughed at the “redundant coffee mug” joke too. It was funny at the time, but that was a while ago. Since then it’s been looking more and more like junior-high-school cliquishness, poking fun at a common target as a way to fit in with the herd. It’s not helping users, it’s not advancing the state of the art, and it’s actively harming the community. As one of the worst offenders once had the gall to tell me, be part of the solution. Find and fix new data-protection issues in whichever projects have them, instead of going on and on about the one everybody already recognizes.

Pomegranate is a new distributed filesystem, apparently oriented toward serving many small files efficiently (thanks to @al3xandru for the link). Here are some fairly disconnected thoughts/impressions.

  • The HS article says that “Pomegranate should be the first file system that is built over tabular storage” but that’s not really accurate. For one thing, Pomegranate is only partially based on tabular storage for metadata, and relies on another distributed filesystem – Lustre is mentioned several times – for bulk data access. I’d say Ceph is more truly based on tabular storage (RADOS) and it’s far more mature than Pomegranate. I also feel a need to mention my own CassFS and VoldFS, and Artur Bergman’s RiakFuse, as filesystems that are completely based on tabular storage. They’re not fully mature production-ready systems, but they are counterexamples to the original claim.
  • One way of looking at Pomegranate is that they’ve essentially replaced the metadata layer from Lustre/PVFS/Ceph/pNFS with their own while continuing to rely on the underlying DFS for data. Perhaps this makes Pomegranate more of a meta-filesystem or filesystem sharding/caching layer than a full filesystem in and of itself, but there’s nothing wrong with that just as there’s nothing wrong with similar sharding/caching layers for databases. Compared to Lustre, this is a significant step forward since Pomegranate’s metadata is fully distributed. Compared to Ceph, though, it’s not so clearly innovative. Ceph already has a distributed metadata layer, based on advanced distribution algorithms to distribute load etc. Pomegranate’s use of ring-based consistent hashing suits my own preference a little better than Ceph’s tree-based approach (CRUSH), but there are many kinds of ring-based hashing and it looks like Pomegranate won’t really catch up to Ceph in this regard until their scheme is tweaked a few times.
  • I’m really not wild about the whole “in-memory architecture” thing. If your update didn’t make it to disk because it was at the end of the in-memory queue and hadn’t been flushed yet, that’s no better for reliability than if you just left it in memory for ever (though it does improve capacity) and if you acknowledged the write as complete then you lied to the user. Prompted by some of the hyper-critical and hypocritical comments I’ve seen lately bashing one project for lack of durability, I have another blog post I’m working on about how the critics’ own toys can lose or corrupt data, and how claiming superior durability while using “unsafe” settings for benchmarks is dishonest, so I’ll defer most of that conversation for now. Suffice it to say that if I were to deploy Pomegranate in production one of the first things I’d do would be to force the cache to be properly write-through instead of write-back.
  • I can see how the Pomegranate scheme efficiently supports looking up a single file among billions, even in one directory (though the actual efficacy of the approach seems unproven). What’s less clear is how well it handles listing all those files, which is kind of a separate problem similar to range queries in a distributed K/V store. This is something I spent a lot of time pondering for VoldFS, and I’m rather proud of the solution I came up with. I think that solution might be applicable to Pomegranate as well, but need to investigate further. Can Ma, if you read this, I’d love to brainstorm further on this.
  • Another thing I wonder about is the scalability of Pomegranate’s approach to complex operations like rename. There’s some mention of a “reliable multisite update service” but without details it’s hard to reason further. This is a very important issue because this is exactly where several efforts to distribute metadata in other projects – notably Lustre – have foundered. It’s a very very hard problem, so if one’s goal is to create something “worthy for [the] file system community” then this would be a great area to explore further.

Some of those points might seem like criticism, but they’re not intended that way – or at least they’re intended as constructive criticism. They’re things I’m curious about, because I know they’re both difficult and under-appreciated by those outside the filesystem community, and they’re questions I couldn’t answer from a cursory examination of the available material. I hope to examine and discuss these issues further, because Pomegranate really does look like an interesting and welcome addition to this space.

A lot of people are commenting on the Oracle/Google suit without having looked at the patents involved. That’s a bad idea, guaranteed to yield incorrect conclusions. For reference, here are the ones actually mentioned in the formal complaint . . . and yes, I did enjoy looking these up on Google.

  • 6125447: Protection domains to provide security in a computer system
  • 6192476: Controlling access to a resource
  • 5966702: Method and apparatus for pre-processing and packaging class files
  • 7426720: System and method for dynamic preloading of classes through memory space cloning of a master runtime system process
  • RE38,104: Method and apparatus for resolving data references in generated code
  • 6910205: Interpreting functions utilizing a hybrid of virtual and native machine
  • 6061520: Method and system for performing static initialization

First thing to remember is that this is a patent suit, not a copyright suit. That means it’s not about “Java” at all. It’s about certain ways of implementing a dynamic runtime, regardless of what name or input language is used. In that context, 5966702 is probably the most specific to Oracle’s actual Java-runtime technology, and that’s all about class files. The others are pretty general ideas, even if the Java runtime was the first embodiment used in the patent descriptions. For purposes of determining infringement, it’s mostly the claims – not the description – that matter. It’s probably quite premature for anybody who hasn’t looked at the Dalvik code to say whether it infringes most of these patents or not, or whether Google could avoid infringing on these claims without fundamentally changing how Dalvik works.

By now, most people interested in NoSQL and cloud storage and so on has probably seen the story of go-derper, which demonstrates two things.

  1. Memcached has no security of its own.
  2. Many people deploy memcached to be generally accessible.

Obviously, this is a recipe for disaster. Less obviously, the problem is hardly limited to memcached. Most NoSQL stores have no concept of security. They’ll let anyone connect and fetch or overwrite any object. One of the best known doesn’t even check that input is well formed, so “cat /dev/urandom | nc $host $port” from anywhere would crash it quickly. Among all of the other differences between SQL and NoSQL systems – ACID, joins, normalization and referential integrity, scalability and partition tolerance, etc. – the near-total abandonment of security in NoSQL is rarely mentioned. Lest it seem that I’m throwing stones from some other garden, I’d have to say many filesystems hardly fare any better. For example, I generally like GlusterFS but it provides only the most basic kind of protection against information leakage or tampering. As a POSIX filesystem it at least has a notion of authorization between users, but it does practically nothing to authenticate those users and authorization without authentication is meaningless. The system-level authorization to connect is trivially crackable, and once I’ve done that I can easily spoof any user ID I want – including root. I’ve had to make the point over and over again in presentations that cloud storage in general – regardless of type – is usually only suitable for deployment within a single user’s instances, protected by those instances’ firewalls and sharing a common UID space. For most such stores, if a cloud provider wants to offer it as a public, shared, permanent service separate from compute instances, a lot more work needs to be done.

What kind of work? Mostly it falls into two categories: encryption and authentication/authorization (collectively “auth”). For encryption, there’s a further distinction to be made between on-the-wire and at-rest encryption. A lot of cloud-storage vendors make all sorts of noise about their on-the-wire encryption, but they stay quiet or vague about at-rest encryption and that’s actually more important. The biggest threat to your data is insiders, not outsiders. The insiders aren’t even going on the wire, so all of that AES-256 encryption there doesn’t matter a bit. Insiders should also be assumed to have access to any keys you’ve given the provider, so the only way you can really be sure nobody’s messing with your data is if you never give them unencrypted data or keys for that data. Your data must remain encrypted from the moment it leaves your control until the moment it returns again, using keys that only you possess. I know how much of a pain that is, believe me. I’ve had to work through the details of how to reconcile this level of security with multi-level caching and byte addressability in CloudFS, but it’s the only way to be secure. Vendors’s descriptions of what they’re doing in this area tend to be vague, as I said, but Nasuni is the only one who visibly seems to be on the right track. It sure would be nice if people could get that functionality through open source, instead of paying both a software and a storage provider to get it. Cue appearance by Zooko to plug Tahoe-LAFS in 5, 4, 3, …

The other area where work needs to be done is handling user identities, which covers both auth and identity mapping. For starters, the storage system must internally enforce permissions between users, which of course means it must have a notion of there even being multiple users. For systems which can assume that a single connection belongs to a single user, you can then authenticate using SASL or similar and be well on your way to a full solution. For systems that can’t make such an assumption, which includes things like filesystems, that’s not sufficient. You need to identify and authenticate not just the system making a request, but the user as well. I’m not a security extremist, so I can accept the argument that if you can fully authenticate a system and communicate with them through a secure channel then you can trust them to identify users correctly. The alternative is something like GSSAPI, which requires less trust in the remote system but can be a pretty major pain to implement.

The last issue is identity mapping. Even if you can ensure that a remote system is providing the correct user IDs, those IDs are still only correct in their context. If you’re a cloud service provider, you really can’t assume that tenant A’s user X is the same as tenant B’s user X. Therefore, you need to map A:X and B:X to some global users P and Q. Because you might need to store these IDs and then return them later (e.g. on a stat() call if you’re a filesystem) you need to be able to do the reverse mapping back to A:X and B:X as well. Lastly, because cloud tenants can and will create new users willy-nilly, you can’t require pre-registration; you need to create new mappings on the fly, whenever you see a new ID. This ends up becoming pretty entangled with the authentication problem because authentication information needs to be looked up based on the global (not per-tenant) ID, so this can all be a big pain but – again – it’s the only way to be secure.

To sum up, the lesson of go-derper is not that memcached is uniquely bad. Lots of systems are equally bad, and making them less bad is going to be hard, but it needs to be done before the other promises made by those systems can be realized. For a great many people, systems that are so totally insecure are useless, no matter what other wonderful functionality they might provide.

As many people I’ve talked to IRL probably know, I really hate language-specific package managers. Java has several, Python/Ruby/Erlang etc. each have their own, etc. I totally understand the temptation. I know it’s not all about NIH Syndrome (though some is); some of it’s about Getting Stuff Done as well. Consider the following example. I tried to install Tornado using yum.

[root@fserver-1 repo]# yum install python-tornado
Loaded plugins: presto
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package python-tornado.noarch 0:1.0-2.fc15 set to be updated

(hundreds of lines of dependency stuff)

Transaction Summary
=====================================================================================
Install      13 Package(s)
Upgrade     161 Package(s)

Total download size: 156 M
Is this ok [y/N]: n

Is this OK? Are you kidding? Of course it’s not OK, especially when I can see that the list includes things like gcc, vim, and yum itself. I know how systems get broken, and that’s it. By way of contrast, let’s see how it goes with easy_install.

[root@fserver-1 repo]# easy_install tornado
Searching for tornado
Reading http://pypi.python.org/simple/tornado/
Reading http://www.tornadoweb.org/
Best match: tornado 1.0
Downloading http://github.com/downloads/facebook/tornado/tornado-1.0.tar.gz
Processing tornado-1.0.tar.gz
Running tornado-1.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-6Wcauv/tornado-1.0/egg-dist-tmp-NEPqMm
warning: no files found matching '*.png' under directory 'demos'
zip_safe flag not set; analyzing archive contents...
tornado.autoreload: module references __file__
Adding tornado 1.0 to easy-install.pth file

Installed /usr/lib/python2.6/site-packages/tornado-1.0-py2.6.egg
Processing dependencies for tornado
Finished processing dependencies for tornado

Yeah, I see the appeal. On one hand, hours spent either rebuilding a broken system or debugging the problems that are inevitable when 161 packages get updated. On the other hand, Getting Stuff Done in about a minute. Yes, I tested, and the result does work fine with the packages/versions I already had. Still, though, having to do things this way is awful. It’s bad enough that there are still separate package managers for different Linux distros, but now programmers need to have several different package managers on one system just to install the libraries and utilities they need. Worse still, most of these language-specific package managers suck. None of them handle licensing, and few of them handle dependency resolution in any kind of sane way. One of the most popular Java package managers doesn’t even ask before downloading half the internet with no version or authenticity checking to speak of. Good-bye, repeatable builds. Hello, Trojan horses. I can see (above) the problems of having One Package Manager To Rule Them All, or of having dependency resolution be too strict, but there has to be a better way.

What if the system package manager could delegate to a language-specific package manager when appropriate (e.g. yum delegating to easy_install in my example)? Then the system package manager could save itself a lot of work in such cases, and also avoid violating the Principle of Least Surprise when installing in the “standard way” for the system yields different results than installing in the “standard way” for the language. There’d still be difficult cases when dependencies cross language barriers, but those are cases that the system package manager already has to deal with. I know there are a lot of details to work out (especially wrt a common format for communicating what’s wanted and what actually happened), possibly there’s even some fatal flaw in this approach, but my first guess is that a federation/delegation model is likely to be better than an everyone-conflicting model.

About eight years ago, I wrote a series of posts about server design, which I then combined into one post. That was also a time when debates were raging about multi-threaded vs. event-based programming models, about the benefits and drawbacks of TCP, etc. For a long time, my posts on those subjects constituted my main claim to fame in the tech-blogging community, until more recent posts on startup failures and CAP theorem and language wars started reaching an even broader audience, and that server-design article was the centerpiece of that set. Now some of those old debates have been revived, and Matt Welsh has written a SEDA retrospective, so maybe it’s a good time for me to follow suit to see what I and the rest of the community have learned since then.

Before I start talking about the Four Horsemen of Poor Performance, it’s worth establishing a bit of context. Processors have actually not gotten a lot faster in terms of raw clock speed since 2002 – Intel was introducing a 2.8GHz Pentium 4 then – but they’ve all gone multi-core with bigger caches and faster buses and such. Memory and disk sizes have gotten much bigger; speeds have increased less, but still significantly. Gigabit Ethernet was at the same stage back then that 10GbE is at today. Java has gone from being the cool new kid on the block to being the grumpy old man the new cool kids make fun of, with nary a moment spent in between. Virtualization and cloud have become commonplace. Technologies like map/reduce and NoSQL have offered new solutions to data problems, and created new needs as well. All of the tradeoffs have changed, and of course we’ve learned a bit as well. Has any of that changed how the Four Horsemen ride?
Read the rest of this entry »

Presentations are the bane of the modern engineer’s existence. If you’re watching a presentation then it means you’re in a meeting, which is already something most of us don’t enjoy, and even worse it means you’re in a kind of meeting (or part of a meeting) that’s only minimally interactive. If you’re giving a presentation, that means even more time away from the technical tasks that drew you to this profession. Nonetheless, any project leader/advocate nowadays and for the last several years has had to spend a lot of time and energy on what is essentially a marketing activity, which is why I dubbed it “markelopment” (a deliberate riff on “devops”) on Twitter. I’m not among those who think presentations are always evil and should be shunned, but after having created/delivered quite a few presentations and sat through a great many more, I think I’m in a position to offer just a little bit advice.

First, I’ll say that hundred-slide decks annoy me. Yes, I know it’s usually a reaction to the problem of slides that are too few and too densely packed, leading to the also-awful phenomenon of the presenter spending most of the time just repeating what everyone can already read, but it’s an over-reaction. The other day I was reading some slides online, and I encountered the following pattern:

Slide N-1: (clip art)
Slide N: “vs.”
Slide N+1: (more clip art)

A whole slide just for “vs.”? That’s wasting my time. Presenters who use that style end up spending too much of their presentation actually changing slides and waiting the obligatory five seconds for the audience to catch up, no matter how little content is on each. Stephen Foskett pointed out that Lawrence Lessig only puts one word on each slide and is still a very highly regarded speaker. Well, yeah, he’s Lawrence Lessig. I’m not, you’re not, and probably neither is anyone you know (unless of course you know Lessig).

Now, I know presentation length can be tricky. I myself do tend to err on the side of making my slides too busy and very spare graphically. I do that because I know that the slides are likely to be viewed more in email etc. than with me actually presenting them, so to make sure they’re useful as a reference I often sacrifice a little on the “live” side. What I’d generally like to do is create two decks – one verbally spare and graphically rich to illustrate or anchor what I’m saying live, and a longer form for sending around later. That means even more time spent in Impress, though, and is often not feasible for various other reasons as well. My best advice is to determine a good “minutes per slide” figure based on the content, the audience, and an honest appraisal of your own ability to keep the audience interested while the slides aren’t changing, then use that to determine an appropriate slide count. If you’re a very dynamic speaker, you can go the Lessig route and spend five minutes on a one-word slide. If you need a hundred slides to fill a thirty-minute presentation, then maybe you’re admitting something about your speaking skills or the intrinsic value of what you’re presenting.

Second lesson: don’t get too cute. I’ve seen too many presentations lately, especially in the “edgier” tech areas, where the author had obviously spent way more time on finding funny clip art and quotes than on the actual content. Again, it’s a balancing act. Humor is good. A good quote or graphic can be an absolutely fantastic anchor for an important point, which you then elaborate or build on verbally. One not-really-funny slide after another after another with too little in between is just distracting.

Another error that I find even less excusable is simple ugliness. Yesterday I saw a presentation which had been done entirely – from title to closing – in what looked like a version of Comic Sans done to look like paint-brush strokes (house painting, not portrait painting). It wasn’t very readable, and looked totally amateurish. I was embarrassed for the author.

Now, somebody’s probably going to think I’m saying that I’ll totally dismiss an otherwise good presentation of an important idea because of slide count or graphics or font choice. Not so. I’ll still listen, but it will cost the author a “point” in my mind. It’s worth keeping in mind that, in these situations, every single point can matter. If you’re presenting to hundreds of people and only care if one or two respond in any significant way to what you’re saying, then maybe none of this matters. Far more presentations are given in smaller groups, though, where the opinion of everyone at the table does matter. People being how they are, they will use all sorts of nuances to form an impression of whether you’re smart, whether you’re trustworthy, etc. It probably won’t be one big thing that causes you not to get that next meeting, but an accumulation of little things. (If you think “meritocratic” open-source techies are any different, BTW, you’re just kidding yourself. The standards are different, but they’re just as stringently applied. Set the wrong tone and you’ll be written off just as surely and completely.) Why give someone the chance to think that you’re too serious or too frivolous, that your presentation shows disorganization, poor prioritization or disrespect for others’ time or sensibilities? Focus on content, by all means, but take just a little time to make sure it’s being delivered in a way that will ensure a good reception.

Amazon has announced Cluster Compute Instances for EC2. This is a very welcome development. Having come from SiCortex, where we provided a somewhat cloud-like ability for users to allocate large numbers of very well-connected nodes on demand, I’ve been talking to people about the idea of provisioning cloud resources on special machines like this for at least the past year. In that light, I find a couple of things about the announcement a bit surprising. Let’s go down the specs first.

  • There’s a single type of HPC instance – dual quad-core “Nehalem” processors with 23GB. The Amazon page points out that this small additional amount of transparency about the exact CPU type allows people to do processor-specific optimization that they generally can’t do elsewhere in EC2.
  • Each instance comes with 1.7TB of instance storage. Performance is not mentioned, but at modern disk drive sizes that might well be just two drives.
  • Connectivity is via 10GbE (NIC and switch vendors not specified). Yuk. 10GbE still lags behind InfiniBand in terms of both bandwidth and latency, both absolute and per dollar. Much has been made lately of the significant and increasing dominance of IB in the HPC world, especially in the Top 500, and the customers Amazon is trying to attract are likely to consider 10GbE a strange choice at best.
  • There is a default limit of eight Cluster Compute Instances without filling out a request form. Eight machines is not enough for serious work of this type, even when the machines are this powerful, so that’s going to affect – and annoy – practically every user.
  • The instances are $1.60 per hour, which is $38.40 per day or a thousand a month. There are others far better qualified to comment on the economics, so I’ll leave it at that.
  • Cluster Compute Instances are only available in one availability zone.

My first thought is that the new offering as currently specified is nowhere near as interesting as it could be – and might be, as the service continues to evolve. Faster interconnects are one obvious way to make it more interesting. Removing the eight-machine default limit – which I strongly suspect is related to the capacity of the switches they’re using – is another. Then it gets even more interesting. When I’ve talked to people about heterogeneous clouds, which is what we’re heading towards here, I’ve generally meant far more kinds of specialization than this. How about instances which are optimized for communications instead of computation, such as with the same 10GbE (or better) but less powerful processors? How about instances which are optimized for disk I/O with multiple spindles and/or SSDs? How about special GPU-equipped instances? Once you can deal with the kind of heterogeneity that today’s CCIs represent, it’s but a short step to handling these other variations as well, so today’s announcement might merely foreshadow even bigger things to come.

The other thought I have about this is that it’s not just about the individual instances. The ability to specify that several instances should be provisioned close to another – probably on the same switch for reasons I mentioned above – is interesting with respect to both the user experience and the infrastructure needed to support it. Location transparency might be a defining feature of cloud computing, but that’s only in the sense of absolute location. Relative location is still a very valid parameter for allocation of cloud-computing resources. When you define a “cluster placement group” in EC2 you’re effectively saying that these instances should all be close to one another, regardless of where they all are relative to anything else. In other situations, such as disaster recovery, you might want to say certain instances should definitely be far from each other instead. We’ve been thinking through a lot of these issues on Deltacloud, but this isn’t a work blog so it would be both unwise and distasteful to say much more about that right now. Suffice it to say that facilitating this kind of placement requires a much more sophisticated cloud infrastructure than “grab whatever’s free wherever it is” which is pretty much the current standard. When you consider relationships not only between instances but also between instances and the data or connectivity they need, it can become quite a science project. The possibility that Amazon might be doing some of that science is, to me, one of the most exciting things about this announcement.