I guess I might as well jump into the multithreading vs. message passing debate that has been making the rounds. The current meme seems to be that multithreading done improperly can do major damage to a program and its author’s sanity. Should I do a Paul Graham impression and say that people who can’t do multithreading properly are just too stupid to even have computers? Maybe some other time.

First, I’d like to point out that many problems – e.g. race conditions, deadlock/livelock, are held in common between the two approaches, even though they might occur in superficially different ways. Other problems are clearly tradeoffs with no “right” answer. For example, many multithreaded systems exhibit poor performance due to excessive context switching, and message-passing advocates point out that their systems don’t have that problem. Well…

  • The excessive context switches are an implementation artifact, avoidable while retaining a multithreaded paradigm (I’ve done it on more than one occasion). They’re not an inherent problem with multithreading.
  • Message passing systems are only immune to the context-switch problem if they run everything in a single thread, which is to say that they’ve traded away multiprocessor scalability for simplicity and single-thread performance. Message passing systems that use multiple threads – yes, even staged systems like SEDA – are just as prone to excessive context switches as any multithreaded program ever was.

In short, I consider that particular form of advocacy, most often engaged in by the message-passing folks, intellectually dishonest. If a problem is not inherently worse, or less readily solvable, in one system than the other, then it is of no value in differentiating between them.

Secondly, I’d like to point out that I’ve written on this topic before. Here’s a scorecard from almost a year ago, more discussion, and some micro-benchmark results. Mostly these focus on single-thread message passing, but the results really don’t change when SEDA-like approaches are considered.

Lastly, my current thoughts. I’m dealing with these very issues in my current project, and the result has been (surprise!) a hybrid between the two approaches. The bulk of the code is decomposed into stages and looks very much like message passing, but the way that requests pass through stages is more like subroutine calls in a multithreaded system than the sort of dispatcher/scheduler that a pure message passing system would use. In addition, there are several maintenance-related tasks (e.g. a dirty-block flusher, a cache pruner) that iterate over private lists and then inject messages into the staged part of the system. So far, it’s working extremely well for me.

I guess the lesson here is: don’t be an extremist. A hybrid approach can solve the worst problems associated with either “pure” approach, but allowing either one to become dogma can be deadly to your real-world effectiveness as a programmer.