A recent Slashdot article about the latest release of Tanenbaum’s Minix led, predictably, to a rehash of the old microkernel vs. monolithic kernel flame war. All of the usual non-fun ensued, such as having to point to L4 and QNX to refute myths about how microkernels are necessarily slow and have never proven useful in the real world. The weirdest part was when a couple of microkernel-bashers who clearly had no idea how much OS support they get for their own programs tried to suggest that the real solution to the world’s problems was to write kernels in a modern “managed code” environment such as Java’s JVM or .NET’s CLR. My observation that they had just proposed a different kind of microkernel was met with sullen silence.

Microkernels are, and always have been, a good way to design operating systems. Note that I said design. Actually implementing microkernels as collections of separate heavyweight processes communicating via heavyweight IPC is not such a good idea. Like distributed filesystems (another technical area that is finally receiving some long-overdue attention), poor early implementations of a basically sound idea led the less informed to dismiss the idea itself. For a microkernel to be usable, the process/IPC model needs to be such that communication between components is efficient. In an extreme form, this could even mean that the actual delivered implementation of a microkernel-based OS actually has all of those “processes” in a single address space with IPC resolved down to simple function calls or even inline code. None of that makes the result any less of a microkernel design. The defining characteristic of a microkernel is not the use of multiple address spaces but the idea that a bare minimum of functionality should not be hot-swappable. By hot-swappable I mean that anything other than the microkernel itself should be separately loadable and unloadable and restartable, and to at least some extent should be able to fail without bringing down others. At a minimum this precludes a global panic() call. Generally it should also include an inter-component communications and error-handling model that prevents one component from indefinitely and explicitly waiting for another which has failed (though implicit waits such as those I describe at the end of my explicit state article are likely to remain a problem solvable only through programmer discipline). As I pointed out in the Slashdot thread, just having loadable/unloadable kernel modules is an important step in this direction and away from truly monolithic kernels, and for many good reasons it’s a step that even the anti-microkernel Linux made.

Modularity and maintainability are sufficient justifications for microkernel design, but they also have another advantage that I never got around to mentioning in the Slashdot thread: microkernels are easier to make distributed. It’s simply easier to distribute components that already use a defined IPC mechanism than those that use an incestuous “fat” interface involving all manner of calls and callbacks and twiddling each other’s bits. QNX is the premier example of this, achieving fault-tolerance levels of which Linux will simply never be capable. MOSIX is certainly good for providing an illusion of full distribution at the system-call level, but it’s a bit hacky and the illusion is never quite the same as a real distributed OS. Distributed OSes were, of course, one of Tanenbaum’s major interests, which might explain why he and Linus differ on the value of microkernels. Just as he once thought nobody needed SMP, and still thinks debuggers or specs are superfluous, Linus probably doesn’t see the value in distributed OSes and therefore in design approaches that facilitate them. The computing community in general should not be bound by one person’s short-sightedness, though. Microkernels are no magic bullet, but they have proven themselves as a good design approach for modern operating systems.