It was kind of fun getting involved in the fsync debate, so now I’ll get involved in another ongoing controversy: software transactional memory. If you haven’t already heard of STM, Wikipedia has a good intro – albeit one obviously written by an STM proponent. Such proponents often try to position STM as the solution to all of our concurrent-programming problems. However, as often happens when the advocacy for something becomes too strident, there seems to be a growing backlash from people pointing out the supposed panacea’s limits and problems.

One example of this is Patrick Logan’s rant about STM, and the ensuing discussion at Lambda the Ultimate. In the discussion, there seems to be a general consensus that locking and shared state don’t scale very well, but then there’s a split between those who propose STM as a solution and those who propose more Erlang-like “shared nothing” approach based on messaging. Every once in a while, but too rarely in my opinion, somebody seems to notice that putting operations under a lock and putting them inside a transaction are actually very similar. Transactions offer some desirable properties such as isolation and composability, but the two approaches have much in common and require similar diligence from the programmer. Races, deadlock/livelock, starvation, and so on are still entirely possible with STM if the programmer is careless – and we all know they are.

On LtU, Tim Sweeney weighs in on the pro-STM side.

You could implement this using message passing, but you’d be writing and debugging your own ad-hoc transactional protocol for each of the tens of thousands of updates and state transitions present in the program.

I understand his concern. Data-consistency problems are often unrecognized as such by programmers who lack the proper training or experience. People given messaging often end up implementing their own kind of distributed state sharing, whether they realize they’re doing it or not, and almost invariably implement it poorly so that they have stale or inconsistent data everywhere. Why not give them STM-based state sharing that avoids at least some of these problems? Because each transaction having its own view creates the same problems as each process having its own view, and becomes more likely as people take advantage of all the composability that is supposedly such a great feature of STM.

Also, and perhaps more importantly to me than to some, we have to consider the distributed case. Distributed STM (DSTM) is essentially the same problem as software distributed shared memory (SDSM). I’ve actually worked on SDSM systems, so I’m pretty familiar with its problems and limits. Over the years I’ve watched one group after another make grandiose claims that they’ve solved the problems and overcome the limits, only to have the actual results disappoint. That’s why I view claims such as ScaleMP’s with some skepticism, and claims backed by even less proof than theirs with something like derision. Show me. Tell me the details, and show me the results – on real applications, not some rigged comparison where an optimized STM implementation is compared to an unoptimized lock-based one to achieve the desired result. Until then, I’ll continue to believe that performance of DSTM is likely to be so bad that programmers will avoid it in most (if not all) cases despite its supposed advantages, so at most DSTM will be relegated to niche use.

Experience trumps theory, of course, so it’s interesting that the most empirical observation I saw in these discussions was from Mental on Patrick’s site.

Having written several implementations of both STM and various message-passing-based concurrency models for Ruby lately, I’m a lot less sunny on STM than I used to be even a few weeks ago.

I was having fun implementing STM until I realized that I was able to implement the Actor model correctly in a few hours, versus several weeks for getting the fiddly aspects of STM down.

The biggest immediate problem for STM is starvation — a large transaction can just keep retrying, and I’m not sure there is a way to address that without breaking composability. …and composability is the whole raison d’etre of STM in the first place.

That’s a pretty good example of STM failing to solve a problem – in this case process starvation – that its proponents commonly portray as characteristic of lock-based systems. How, indeed, does one guarantee fairness and forward progress, or even avoid livelock, in an optimistic-concurrency model where you just retry everything when contention is detected? I’m sure somebody has come up with solutions. I’m equally sure those solutions are rarely implemented in actual STM code, so real programmers will encounter these real and hard-to-debug problems when they try to follow the STM advocates’ advice.

Probably the most damning point about the limitations of STM was made by Sriram Srinivasan on LtU.

Second, there are many areas that don’t come under the purview of STM (file activity, screen updates, real db transactions with different tx semantics). They don’t work under an arbitrary retry model, unlike in a locking case where you know you own the critical section.

I love the idea of STM, particularly MVCC STM, and have spent a lot of enjoyable time musing about how I would implement it or use it in programs. Nonetheless, after reading all this, I can’t help but conclude that STM might be too limited to be of much interest to me in the things I do. It might be great for computation within a single process (Tim Sweeney’s example is a good one) but it seems to fall apart in the distributed case or where side effects like I/O are important. Maybe there are some constrained distributed cases in which the benefits would outweigh the costs. After all, many people do useful work within the messaging constraints imposed by MPI. A lot of big simulations are done by partitioning objects or physical regions between processes, having each process calculate one time-step’s worth of change for its objects/regions, then all getting together to share the results before launching the next time step. Maybe some restricted form of DSTM would allow all of this to happen more efficiently using transactions instead of barriers, and displace MPI in some cases. For the more general case, though, I remain rather skeptical.