McCusker has written some more about multithreading vs. event queuing. Unfortunately, about 70% of what he writes is really about how “microthreads” work in an interpreted virtual machine. That’s a fine topic but one I consider orthogonal to that of which programming model to use for network applications, so I’ll just comment on the other 30%.

Here’s the first bit that caught my eye:

The big problem I want to solve is independence from native threads…it’s huge to be able to go around and ignore a thread showstopper.

One can continue development in other areas, without getting hung up.

Being able to use the same code in both single-threaded and multi-threaded environments is a noble goal. However, my experience with such systems has been pretty disappointing. Several times I’ve worked on systems that promised this feature, and they never worked quite the same in the two modes. Whereas the promise was to make debugging easier, the actual result was often to require that problems be debugged twice – once in each mode. To be fair, all of these systems (that I can recall) came at the dual-mode thing from the direction of having originally been multi-threaded. Maybe something that started out more on the event-based side and evolved to make use of multiple threads would meet with greater success…or maybe not.

Second quote:

Event queues can be used as a bridging device that works both with threads and without threads.

Agreed, but with some caveats. Here are two common pitfalls I’ve encountered in queue-based systems:

  1. Too many back-to-back queue/dequeue operations. Many queue-based systems use an interface where a request is always put on the recipient’s queue, then the recipient’s dispatch routine is called…and 99% of the time the very first thing it does is take that same request right back off the queue. That’s kind of silly. IMO the sender should invoke the recipient’s dispatch routine (or a stub thereof) directly, and requests should be processed immediately or put on a queue for later at the dispatch routine’s discretion. Most often, only the recipient module – i.e. not the sender and not the infrastructure – can make an intelligent decision about whether to queue something.
  2. Complex queuing. Too many event-queuing systems have interfaces that just sort of grow without bound. First they turn a simple queue into a multi-level priority queue. Then someone adds time to the interface so that they can use the event queue for timeouts as well (a mistake for other reasons too, IMO). Sometimes they add flow control. By the time all these features are added, what was once a simple “here, process this” interface has become an ultra-hairy mess where passing a request from one module to another has become a major undertaking. In extreme cases, this has caused programmers to abandon the original queuing infrastructure in favor of reinventing it in its original simple form for use within a single module.

I do understand the temptation to evolve queuing systems in this way. In my own project, requests involving the same object are generally serialized, except for one special request where – to avoid deadlock – an inbound invalidate request has to “overtake” an outbound I/O request. I was tempted to turn my queues into priority queues, and at one time I probably would have, but I’ve been down that road before. Instead, I changed the queuing model so that queuing is done internally to a module, using library routines, with an indication back to the infrastructure of whether the request was queued or completed. The infrastructure is thus even simpler than it was before (when it handled queuing), and modules are able to implement arbitrarily complex queuing internally if they need to.

There’s actually another reason I went this route. My system is designed to handle requests for millions of separate blocks. There’s no way in hell that I can afford to have a separate queuing-system entity for each block, but I do need to be able to queue requests separately for each to avoid loss of parellelism due to false queuing dependencies. Should the queuing infrastructure need to know that I’m queuing per-block instead of globally? Hell, no. Again, the queuing rules should be known to the individual module, not exposed to the infrastructure.

I’m not disagreeing with David on either of the points I’ve addressed. I think he’s very much on the right track in both regards – not that my opinion on that matters more than that of any other reasonably experienced network hacker. I hope my comments can be accepted in the intended spirit of comparing notes and thinking things through collaboratively to arrive at an optimal design. I also hope others will be encouraged to jump in and share their own experience with these sorts of systems.