All of a sudden, everybody is talking about concurrent programming. The driving factor, of course, is the fact that multi-core processors have moved beyond the server room, into the machines people use for web surfing and games. Whether that’s a good kind of evolution is a question best left for another time. Right now, much of the debate is over how to make it possible for programmers to take full advantage of all that extra processing power. Because of what’s driving it, most of this conversation is focused on systems where all processors share equal access to a single memory space. In fact, neither the access times nor the memory layout are always completely uniform, but that’s not really important here. In the past, I’ve written quite a bit about programming in this kind of environment. It’s probably why many of you are here, but today I’ll be going in a slightly different direction. I say “slightly” because I won’t be going to the other extreme – internet scale with dozens-of-millisecond latencies and high node/link failure rates – either. For now, I’ll be somewhere in between – clusters of machines with separate memory, connected with some kind of modern interconnect such as InfiniBand or 10GbE.

Before I get started, let’s talk about why such clusters have become dominant both in scientific computing (e.g. Roadrunner, Jaguar) and in internet services (e.g. Google, Amazon, Facebook). The basic reason is the same one behind the move to multi-core processors. Just as it has become prohibitively difficult to make a single processor go faster, it has long been prohibitively difficult to make a shared memory system go faster. Yes, it might be possible, but the cost and complexity of such a system (where complexity tends to be the inverse of reliability) make it a poor alternative to clusters. In one case this forces a move to multiple processors per systems – or even per die – while in the other it forces a move to multiple systems clustered together. At this point some might say that the shared-memory programming model can still be preserved even if the memory is actually distributed, with the “illusion” maintained by some combination of hardware and system software, and they’re right. I’ve worked on such systems, and they’re a great way to support either new or old code based on that programming model, but I’d argue that it’s the wrong programming model for that kind of system. The main reason that shared-memory systems ever became common was performance – processor loads and stores being orders of magnitude faster than messaging except on the most exotic of architectures – but once you’ve gone “outside the box” that advantage is lost. To maintain that programming model you’ll be using messages to emulate shared memory. Why not exchange those messages directly, avoiding a layer of paradigm translation? Some applications are inherently biased toward a shared-memory approach, and benefit from such emulation. Somewhat more applications are inherently biased toward messaging, and more still can be implemented equally well using either approach – if the author has an equal understanding of both. Much shared-memory code is written that way only because of familiarity, not because of innate appropriateness. That’s what I’m trying to change.

When it comes to programming models, everyone’s favorite whipping boy is the model where access to shared memory is controlled using locks or their close relatives (e.g. semaphores, condition variables). To be sure, this approach is fraught with peril – race conditions, deadlock, livelock, thundering herds, indefinite postponement, lock or priority inversion, and the list just keeps going. The funny thing is that most of these don’t really go away with any of the programming models that are proposed as solutions (including the actor model). For example, software transactional memory is what all the Cool Kids talk about the most. It’s a good model, with many advantages over lock-based programming, but a program can still deadlock at a higher level even if it’s using STM to avoid deadlock at a lower level. There’s an old saying that a bad programmer can write Fortran in any language, and it’s equally true that a bad programmer can create deadlock in any programming model. All you need to do is use your cool programming model to implement program-specific primitives at too low a level, perhaps even equivalent to traditional locks (surprisingly easy to do even if you didn’t mean to) and all of those old problems reappear. STM also has a few of its own problems, such as a harder time avoiding indefinite-postponement problems and no good answer for non-retriable actions such as I/O. Most importantly of all, nobody seems to have come up with a robust and efficient way to implement STM without shared memory, so it inherits all of the same scaling problems as lock-based programming does. (I’d be glad if somebody could prove me wrong on this point, but don’t expect it to be easy.)

OK, I hear you say, we get it, shared memory is bad. We’re stuck with messaging, but that’s just as much of a mess. In a way, that’s true. The problem is that messages alone are too low-level. You can use them in a structured way to implement higher-level abstractions, or you can use them in a very ad hoc kind of way . . . just like you can use shared memory and locks, just like you can use class hierarchies, just like you can use almost any processor or programming-language feature. Some people’s distaste for messaging is based on exposure to high-level design-by-committee messes like CORBA, SOA, or MPI. Some people’s distaste is based on exposure to low-level design-by-idiot messes of ad hoc messaging. It’s all a matter of what abstractions you choose, plus how well you implement and use them. There are no silver bullets. Nonetheless, the actor model might be able to help.

So, what is the actor model? Perhaps this is best answered by saying what it isn’t. Both lock-based programming and STM are based on a model of passing around control over state. In either model, whichever thread wants to change some state does so itself, perhaps taking some additional action to exclude others or resolve conflicts. Even some message-based systems still have this feature, often hidden behind a lease-granting or checkout/checkin abstraction that allows any thread to gain temporary control over data from its usual owner. In the actor model, each piece of data has one owner, and only that owner can directly access the data. Except for certain failure cases, ownership is permanent and control is never “lent out”; the owner never has to break a lock, revoke a lease, or issue a flush/invalidate request so that the next request can proceed. Anybody else has to send the owner a message requesting the desired action. Think of it as the difference between asking someone else to staple some sheets of paper together, vs. grabbing the stapler to do it yourself. In the actor model, nobody else can use the red stapler. ;) There’s more to the actor model than that, and I’m sure some might even say I’ve misrepresented it, so I strongly recommend you follow the link above before continuing. You can also try Sergio Bossa’s description, or Alex Miller’s link list.

It’s important here to note another thing the actor model is not: it’s not a panacea. As James Iry points out, the actor model still involves shared state. Even worse, it tends to make state implicit rather than explicit, the dangers of which I’ve written about before. However, control over state isn’t shared or passed around. Because of this, and because all modifications to state are serialized through the actor controlling that state, a whole bunch of common concurrency problems tend not to occur. Failure modes becomes simpler, as does reasoning about correctness. However, you can still screw up by using actors to implement simple shared variables, and then all of the usual problems reappear. You can still create coherency problems by caching responses received from an actor, and failing to account for the possibility that the authoritative values might have been changed behind your back.

Effective use of the actor model requires that you think about the higher level actions on data that your application requires, and create actors that correspond to them. This is an exercise much like (some might say the same as) defining the proper set of objects and methods for your application, only there’s a natural incentive to push the abstraction level higher because we’ve all learned by now that messaging is expensive (even when that’s not really true) and fewer high-level requests is better than more low-level requests. Note that an actor-based program might still perform worse than an equivalent lock-based program on a system that actually does have shared memory – or it might not, depending on issues such as implementation quality, lock or cache-line contention, etc. On a cluster system, it’s highly likely to perform better as there’s no need to translate the application’s high-level actions to and from low-level actions on individual pieces of data.

Now that I’ve spent so long setting the stage, I’m afraid the play will have to wait. In my next post on this subject, I’ll try to provide an example of how to use the actor model in a real program. Don’t worry; you won’t have to learn Erlang or Scala to follow along. Though those languages explicitly support actor-model programming, the model itself is perfectly applicable to better known languages such as Python or even C.