Apparently Linus has made another of his grand pronouncements, on a subject relevant to this project (thanks to Pete Zaitcev for bringing it to my attention).
People who think that userspace filesystems are realistic for anything but toys are just misguided.
I beg to differ, on the basis that many people are deploying user-space filesystems in production to good effect, and that by definition means they’re not toys. Besides the obvious example of GlusterFS, PVFS2 is almost entirely in user space and it has been used to solve some very serious problems on some seriously large systems for years. Everything Linus has worked on is a toy compared to this. There are several other examples, but that one should be sufficient.
So where does Linus’s dismissive attitude come from? Only he can say, of course, but I’ve seen the same attitude from many kernel hackers and in many cases I do know where it comes from. A lot of people who have focused their attention on the minutiae of what’s going on inside processors and memory and interrupt controllers tend to lose track of things that might happen past the edge of the motherboard. This is a constant annoyance to people who work on external networking or storage, and the problem is particularly acute with distributed systems that involve both. Sure there are inefficiencies in moving I/O out to user space, but those can be positively dwarfed by inefficiencies that occur between systems. A kernel implementation of a bad distributed algorithm is most emphatically not going to beat a user-space implementation of a better one. When you’re already dealing with the constraints of a high-performance distributed system, having to deal with the additional constraints of working in the kernel might actually slow you down. It’s not that it can’t be done; it’s just not the best way to address that class of problems.
The inefficiency of moving I/O out to user space is also somewhat self-inflicted. A lot of that inefficiency has to do with data copies, but let’s consider the possibility that there might be fewer such copies if there were better ways for user-space code to specify actions on buffers that it can’t actually access directly. We actually implemented some of these at Revivio, and they worked. Why aren’t such things part of the mainline kernel? Because the gatekeepers don’t want them to be. Linus’s hatred of microkernels and anything like them is old and well known. Many other kernel developers have similar attitudes. If they think a feature only has one significant use case, and it’s a use case they oppose for other reasons, are they going to be supportive of work to provide that feature? Of course not. They’re going to reject it as needless bloat and complexity, which shouldn’t be allowed to affect the streamlined code paths that exist to do things the way they think things should be done. There’s not actually anything wrong with that, but it does mean that when they claim that user-space filesystems will incur unnecessary overhead they’re not expressing an essential truth about user-space filesystems. They’re expressing a truth about their support of user-space filesystems in Linux, which is quite different.
A lot of user-space filesystems -perhaps even a majority – really are toys. Then again, is anybody using kernel-based exofs or omfs more seriously than Argonne is using PVFS? If you make something easier to do, more people will do it. Not all of those people will be as skilled as those who would have done it The Hard Way. FUSE has definitely made it easier to write filesystems, and a lot of tyros have made toys with it, but it’s also possible for serious people to make serious filesystems with it. Remember, a lot of people once thought Linux and the machines it ran on were toys. Many still are, even literally. I always thought that broadening the community and encouraging experimentation were supposed to be good things, without which Linux itself wouldn’t have succeeded. Apparently I’m misguided.