I’ve continued to be active in the Linux-VM debate over on Slashdot. One sub-thread stemming from the post I mentioned yesterday has focused on the software-process aspects of the situation. In an entirely different thread I’ve been one among many correcting a whole bunch of untruths and misconceptions about how virtual-memory systems work. Of course, no discussion of Linux-VM would be complete without me pointing out the total stupidity of the “OOM killer” idea.
I just took the Animal In You test. Apparently I’m a wild dog, with warthog as a second choice, and when I look through all the other alternatives I think I’d be a decent badger as well. Obviously, platypus wasn’t on the list.
In the course of a conversation with Zooko, I started thinking about the transient nature of “expertise”. Both he and I might be considered experts in certain areas, centered around distributed data replication/distribution/etc. What I wonder is whether this expertise really represents our having climbed some particularly high pinnacle of knowledge, or whether it’s more like we’ve begun to open a gate that everyone will soon be walking through. Maybe ten years from now we’ll try to explain our contributions to some young punk who’ll respond, “What’s the big deal? Every first-term CS major knows that’s how it’s done.”
Sadly, that seems to be the general fate of innovators. People like Turing, von Neumann, or Dijkstra developed ideas so useful and so general that people just take them for granted now and can’t imagine that anyone ever considered alternatives. Nobody thinks twice about cache-coherent SMP machines now, but even within my own career I remember when that was new territory and nobody could quite agree on how to solve the basic problems. That stuff “just works” now, but only because hundreds of people tore their hair out developing methods and protocols that actually worked. Ditto for high-availability clusters. Maybe it’ll be the same for wide-area data access. Zooko says that will be a good thing, because at least it’ll reduce the number of people repeating the same basic mistakes. I guess he’s right, but it still saddens me that even the greatest innovators’ contributions can be forgotten so quickly.
I haven’t posted anything about Linux here for a while. Here’s a little piece I wrote on Slashdot about the virtual-memory-system mess going on in 2.4, with a segue into why 2.5/2.6 should have been started long before now.
IMO both Rik’s code (RVM) and Andrea’s (AVM) were accepted prematurely, and Linus’s ADD is the root of the problem here. Everyone thought the 2.2 VM was broken, so he jumped on RVM when it really hadn’t received adequate testing with various workloads. Then, when that didn’t work out, he did something even worse by jumping on AVM in the middle of a “stable” kernel series when it was totally undocumented and even less thoroughly tested than RVM. That’s just bad software engineering, regardless of the quality of Rik’s or Andrea’s work.
Ideally, an “old-fashioned” alternative to RVM would have been maintained throughout the 2.3 process, as a fallback in case RVM turned out not to be ready for 2.4 – which was in fact the case. But this wasn’t done, there was no alternative, and so RVM became the basis for 2.4. Once that decision was made it should not have been unmade by replacing RVM with AVM. Andrea’s work should have been in the 2.5 tree, which should have been opened a long time ago to deal with precisely this sort of situation. 2.4 is not the last Linux kernel that will ever exist. We don’t need to make it perfect. It would be far better to admit its imperfections, band-aid them as best we can, and try to get a head start on creating something better for 2.6. What we have instead is error on top of error, “not ready” replaced with “even less ready”.
To clarify, I have nothing but the highest regard for both Rik’s and Andrea’s work. Obviously they have different ideas and attitudes. Rik has drawn on many sources in his design, resulting in a system that is both very advanced and very complicated. The process of reining in the complexity is still incomplete, but I still have hope that some day Rik will be able to come up with something that’s really awesome, and he has always documented his ideas thoroughly. Andrea, by contrast, is much more pragmatic; he wants something that works now even if it’s somewhat more limited in scope (e.g. by being almost impossible to reconcile with NUMA). The dark side of that “pragmatism” is that Andrea has skimped on non-code activities such as documenting or explaining the basic ideas on which his system is based. Nonetheless, both have done great work and should continue to do great work…in the 2.5 tree.
Here’s a review of Curt Schimmel’s UNIX(TM) Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers that I originally wrote for Amazon. Unfortunately, my review got eaten. Whatever Amazon might claim, I’ve noticed that the probability of my reviews being “lost” is directly related to how positive or negative they are. You can guess what I think of that, but I’ll let you draw your own conclusions. In any case, I’ll continue resubmitting the same review once a week until it gets posted or I hear a good reason why it won’t be. Until then, here’s the HTML form.
In many ways this is a great book. The subject is one that is known to induce headaches, and the author covers it with truly admirable clarity. It’s worth buying the book for the chapter on cache consistency alone; like many others, I had to spend years piecing the same information together from varied sources, and it would be hard to overstate the value of having it all in one place.
So why only three stars? The problem is that the book is incomplete. Cache systems and virtual-memory systems interact in myriad ways, but you wouldn’t know that from reading this book. Similarly, storage and networking subsystems are often the bloodiest battlegrounds with respect to multiprocessor synchronization, and yet special considerations in those areas are not covered. Many old architectures (e.g. Apollo, ELXSI) are mentioned, and yet NUMA never even gets a nod. I know that covering all of these topics in any kind of depth would be impossible in a single book of any reasonable length, but their total omission is something I consider unacceptable.
This is a book I would recommend without hesitation to any number of people. Unfortunately, that recommendation would always have to be accompanied by recommendations for other books that pick up where this one inexplicably leaves off.
I ran some tests to measure performance implications of the multithreading approaches mentioned yesterday. The tests consisted of a server implemented using a chosen model, plus two single-threaded clients running on separate machines just blasting simple requests at the server as fast as they can and ignoring responses. Each request has a 97% chance of completing without any sort of delay; the remaining 3% experience a delay of exactly one second.
Here are some results:
- Asymmetric multithreading: 7K requests/sec
- Symmetric multithreading: 20K requests/sec
- On-demand multithreading: 23K requests/sec
- Event-based (“monothreading”): 24K requests/sec
Obviously, asymmetric multithreading sucks. It’s by far the worst model for this (common) kind of workload. It’s also notable that the big jump in performance comes when you switch from asymmetric to symmetric multithreading. I had expected the big jump to be between symmetric and on-demand multithreading, but here I have to confess that I made an error yesterday. In asymmetric multithreading there are at least two context switches per request, not one: from the listener to the worker and back to the listener. Symmetric multithreading cuts this in half by requiring only one worker-to-worker switch per request, and that’s why there’s a big jump. Apparently, there aren’t that enough avoidable context switches left after that to make much of a difference.
The most interesting comparison, perhaps, is between on-demand multithreading (ODM) and event-based programming (EBP). For starters, there’s just not that much difference. ODM benefits more from parallel operation, while EBP involves less coordination overhead, but in the end it’s a wash. Perhaps more importantly, ODM is more “transparent”; it doesn’t require explicit state maintenance like EBP does, handles certain common constructs (e.g. retry loops) more cleanly, and does not rely on an environment that supports true callback-based asynchronous I/O.
It seems to me that EBP only makes sense when it’s fully supported by the environment, when the reduction in coordination overhead outweighs the reduced parallelism, and there’s no less painful way to squeeze a few percent out of your application. I know a lot of EBP advocates are likely to read this and disagree with me, but I have to say that I see no performance argument favoring EBP over reasonably-implemented multithreading. If anybody would like to make such an argument, it won’t be very convincing unless it’s based on real numbers for realistic workloads.
I’ve been thinking about event-based vs. multithreaded programs again. Some thoughts from last time are here in case you didn’t know this is a recurring thing for me. My general attitude is that neither model is inherently “better” than the other; it all depends on what you’re doing. Today’s thought is that a key factor is which approach does the best job of avoiding context switches in your situation. Consider a classic server-type application that spends its life responding to a series of independent requests. There are several common models:
- In the “asymmetric multithreading” model, there are listener threads and worker threads, which communicate through one or more request queues. Listeners are bound to communication channels, each one repeatedly reading requests and placing them on request queues; workers are bound to request queues, each one repeatedly dequeuing and servicing requests. Each request therefore involves exactly one context switch from a listener to a worker, and possibly more if the workers contend with one another for resources.
- In the “symmetric multithreading” model, there is no distinction between listeners and workers. Each thread loops through the following steps:
- Take an incoming-request lock.
- Read one request.
- Make sure there’s another thread to listen for the next request.
- Release the lock.
- Service the request.
This approach has the same context-switch characteristics as the previous one, but represents an evolutionary step toward the next approach.
- “Threading on demand” is almost identical to symmetric multithreading, except that steps 3 and 4 are deferred whenever possible. A thread X goes on its merry way without worrying about whether there’s anyone left to read the next request. It’s only when X needs to block that it lets someone else take over reading the next request; if X never blocks, no thread creation or switching occurs. This approach avoids most of the context switches associated with other multithreading approaches, without sacrificing the ability to block when needed and pick up transparently where one left off.
- In the event-based model, there is only one thread, which repeatedly fetches an event – not a request – from a queue and then processes it. Processing of a single request might involve multiple events. In situations where a multithreaded server might block, the event-based server manually saves state and then goes on to process the next event; when the blockage is resolved that will be dealt with in the handler for the unblocking event.
The event-based model involves no contention and no context switches, for the very simple reason that there are no contexts to contend or switch. There is one fly in the ointment, though: how do events get on the event queue? In a lot of cases, people start with an event-based model and then add per-socket listener threads which put events in the queue when requests arrive. What they end up with is not really event-based at all, but asymmetric multithreading in disguise. Sometimes they don’t even realize it, because the listener threads are hidden in someone else’s library. A similar trap involves adaptation libraries that use extra threads to convert synchronous I/O to asynchronous. No matter how it happens, the result is more context switches and lower performance than if they had used symmetric multithreading to start with.
True event-based programming is possible in some environments where true asynchronous I/O facilities (callbacks, not poll/select or O_NONBLOCK) are available, and in those environments event-based models can be a big win. This is commonly the case in OS kernels, embedded environments, or (oddly) Windows NT but not in most flavors of UNIX. In environments lacking true asynchronous I/O, event-based programming almost invariably degenerates into asymmetric multithreading at some level, and shares all of that model’s performance disadvantages. Do you know which model you’re really using?
This was part of a discussion on making the transition from coder to architect. You can see my original response in the previous log entry, but when I saw this list of ways to be the sort of architect everyone hates, I felt compelled to respond in a different way.
I’ll translate a few of these, for those who haven’t learned to read between the lines.
Clearly define yearly goals. Make sure they’re realistic and qualitative, not quantitative
“It’s harder to lie about whether you met a goal when it has numbers attached to it.”
Touch every project you know of that’s related to your work. No need to get heavily involved
“Drive everyone crazy by constantly pestering people for information and/or making useless suggestions based on superficial understanding of the real problems. Come review time, you can claim you helped everyone, and the damage you’ve inflicted on their schedules will make you look good by comparison.”
Write quarterly reports. Trump up any work you’ve done on popular projects, keep work on politically sensitive projects to a few lines.
“Take credit, avoid blame.”
Request at least two weeks of training a year…Request to go to at least two conferences per year
“Spend as much time as possible away from the office so that the people doing real work can’t hunt you down and give you that well-deserved kick in the teeth.”
if you need some ideas on training and seminars…just go here
“I have a financial arrangement with these guys.”
Get used to the idea that as an architect you will no longer be able to measure your productivity in terms of lines of code or any similar “objective” measure. When I first started getting more involved in architect-level activities, I saw that my productivity as a coder was declining and I was quite distraught. It took me a long time to reconcile myself to the idea that code was no longer my main contribution, and that I had to find more flexible ways to determine whether I was functioning optimally. This is also a time-management problem, as you become less able to use checkin trails etc. to keep yourself on schedule.
Accept that you cannot escape your responsibility to be a leader, mentor, etc. Think of yourself as a high-level NCO on the battlefield. You’re not an officer making command decisions and you’re not some paper-pusher who never picked up a gun; those are the executives and managers respectively. Instead, you’re in the foxholes with the grunts, fighting the same war they are. Your leadership consists of communicating basic skills and priorities, managing morale and discipline, acting as an advocate, and generally setting a good example. If you’re not comfortable doing all of these things, find a different role, perhaps one that concentrates more on specialized technical skills, because nobody is more universally loathed – by superiors, peers, and more junior team members – than a tech lead or architect who doesn’t help to “stiffen the backbone” of the organization.
In a similar vein, your new position makes you a target for the climbers and backstabbers in your company. Everything you say will travel further and carry more weight than it did before, with potentially disastrous consequences if you’re not careful. A grunt can say almost anything because, basically, nobody will really notice or care. When you’re an architect that’s not the case. I know it seems political, but it pays to develop “situational awareness” of who you’re talking to, what their agendas are, who they’re likely to repeat your words to, etc. It’s distasteful, but if you want your people or your projects or your principles to prevail then you have to avoid traps and ambushes.