Ingo Molnar has published a Linux patch that introduces “threadlets” – functions that can run within the context of a caller as a simple function call, unless/until they need to block, at which point they are split off into their own threads. Ingo refers to this, not inaccurately, as “on demand parallelism” and it’s pretty cool. I actually know of one other system that did basically the same thing several years ago, except that it was for kernel interrupt handlers, and it was cool then too. I do worry, though, that people will mistakenly think of threadlets as a complete solution to the problem of balancing function-call convenience with separate-thread scalability, when in reality they are more of a complement to existing mechanisms than a replacement. Specifically, threadlets are best when the code within a threadlet meets two criteria:

  • It might block.
  • It probably won’t.

If there is no possibility at all that the code will block, threadlets don’t buy you anything compared to a simple function call. On the other hand, if the code probably will block then threadlets are no better than regular threads. In fact they are regular threads, except without many of the primitives that would be available to manage normal threads. Spawning a threadlet per connection in a network server, for example, would probably be worse than just spawning off a regular thread per connection.

Fortunately, this “might block but probably won’t” space in between is very large. A lot of code doesn’t know whether it will block, because it’s making system calls that hide that information, and hiding it is sort of the whole point of a multi-process operating system. An interface that made blocking more explicit or visible from the user-space code’s point of view would also involve many more kernel/user boundary crossings, which would significantly degrade performance and impede debugging. The result is that an application often has to make a pessimistic assumption: even if it knows that a particular operation will hardly ever block, it has to execute that operation in a separate thread just in case it does. Threadlets allow near-optimal handling of such cases with even less effort than such threading, and that’s good.

It will be interesting, though, to see if threadlets tend to mask concurrent-programming errors. It’s not hard at all to imagine a program in which some threadlet code always executes in the original context in testing, and then does switch to a new thread in production. In my experience, “dual mode” code that can run on either one or multiple threads (or for that matter processors) almost never exhibits the same bugs in both modes. I almost think it might be useful to have a way of telling the threadlet subsystem to run threadlets in their own threads even if it doesn’t need to, just to make sure that that case gets tested and debugged.