Over on Greg Laden’s somewhat hyperactive blog, I got into a little bit of a debate about whether or not stable APIs (Application Programming Interfaces) are a good thing. That led to a second post, in which Greg L cites Linux guru Greg Kroah-Hartman on the subject. Since Greg L had explicitly asked my opinion about what Greg K-H has to say, I’ll respond here and link back to there. First, let’s deal with a little bit of Greg K-H’s introduction.

Please realize that this article describes the _in kernel_ interfaces, not the kernel to userspace interfaces. The kernel to userspace interface is the one that application programs use, the syscall interface. That interface is _very_ stable over time, and will not break. I have old programs that were built on a pre 0.9something kernel that still work just fine on the latest 2.6 kernel release. That interface is the one that users and application programmers can count on being stable.

What Greg K-H is talking about is therefore actually an SPI (System Programming Interface). Normally it would be OK to use the more familiar term API anyway, but in the interests of clarity I’m going to adopt strict usage here. What I want to point out here is that most of the supporting-legacy-code bloat which Greg L cites as a major liability for Windows is not in the SPI that Greg K-H is talking about. All that stuff about code to support old applications or whatever is in the API, and there’s plenty of the same sort of cruft in Linux’s API for the same reasons.

Moving right along, what Greg K-H seems to be arguing is not that a stable SPI is worthless in and of itself but that the need for one is largely subsumed by having code be part of the main Linux kernel. He’s actually right in a particular context, but I don’t think it’s really the proper context for three reasons.

1. There are things out there besides PCs
If the code in question is a driver for a consumer-grade Ethernet card that dozens of kernel hackers are likely to use in their own machines, then most of Greg’s reasons for getting code into the kernel make sense. Other developers probably will be able to add features or fix bugs, or make SPI-related changes, to your code, and then test those changes. On the other hand, if you’re working on code for a cutting-edge high-end system that uses one of the less common CPU architectures, with an unusual and notoriously tricky memory-ordering model, with a lot of other features that bear little relation to what you might find in your desktop PC, it’s a bit different. Other developers might well be able to make useful changes for you. They also might screw them up in very hard-to-debug ways, particularly if they bring PC assumptions with them to something that’s not a PC. They certainly won’t be able to test their changes. In the end, getting your code into the mainline kernel won’t reduce your maintenance burden a whole lot.

2. Everyone has to play along
It’s one thing to say that you’ll play along and get your code into the kernel, but many projects nowadays involve drivers and patch sets from many sources. Getting your own patch A and/or patch B from vendor X into the kernel might not help you all that much if vendors Y and Z decline – for whatever reason, good or bad – to play along by doing likewise for patches C through G. Sometimes they want to do it, but their patches are rejected by the Kernel Gods for what might be legitimate technical reasons but are just as often petty political ones. Either way, you’re still going to be stuck with complex multi-way merges cascading outward any time any of those components moves forward, either because the vendor did something or because someone else changed part of the SPI. In other words, you don’t really gain much until that last vendor joins the club. By contrast, every attempt to preserve SPI compatibility brings its own immediate gain, even if other parts of the SPI change.

3. It doesn’t scale
“One kernel to rule them all and in the darkness bind them” solves some problems, but bundling every bit of device- and architecture-specific code with the kernel has its costs too. How many millions of hours have been wasted by developers configuring, grepping through, and diffing against PA-RISC and s390 code even though they’ll never use anything but an x86 on their project, just because they do those commands at the whole-tree level and never got around to automating the process of pruning out irrelevant directories every time they sync their tree? Even worse, how many times has someone who thought of making a change balked at the idea of applying it to every file it affects? How many times have they forged ahead anyway, but then gotten bogged down dealing with bugs in a few particularly intransigent variants? How much time has been wasted on the LKML flame wars that result? Preserving every architecture and device in one tree can have the same chilling effect on development as preserving every past SPI version.

If you’re living in the same desktop-and-small-server universe that Windows lives in, and you don’t have to deal with development partners who can’t or won’t get their own patches in the kernel, then getting your own code into the kernel might seem to obviate the need for a stable SPI and bring other advantages besides . . . this year. Down the road, or if those constraints don’t apply to your project, that might not be the case. SPI instability is a bad thing in and of itself, even if the pain doesn’t seem too great or there are other reasons to endure it. As I said to Greg L, it’s not something that makes Linux better than Windows. It’s an artifact of where Linux is on the adoption curve, quite reasonably more concerned about attracting new users than about alienating old ones. As Linux climbs that adoption curve, perhaps to a point of becoming a dominant operating system, I think that calculus will change. Breaking SPI compatibility is sometimes justified, but it’s almost never anything to be proud of.