Why You Don't Need STONITH

(This started as a Hacker News discussion about an article on Advogato. The articles title/premise is "Why You Need STONITH" where "STONITH" means "Shoot The Other Node In The Head" and is an important concept in old-school HA. I might even have been present when the acronym was coined, after having used a similar one at CLaM.)

I was working on HA software in 1992. Specifically, I was working on the software from which Linux-HA copied all of its terminology and basic architecture. We ourselves were not the first, and often found ourselves copying things done even earlier at DEC, so I'm not complaining, but I want to make the point that this article from 2010 is actually a rehash of a much older conversation. As cute as the metaphor is, it gets two things seriously wrong.

(1) Fencing and STONITH are not the same thing. Fencing is shutting off access to a shared resource (e.g. a LUN on a disk array) from another possibly contending node. STONITH is shutting down the possibly contending node itself. They're quite different in both implementation and operational significance. Using the two terms as though they're interchangeable only sows confusion.

(2) You only need STONITH if you have the aforementioned possibly contending nodes - in other words, only if the same resource can be provided by/through either node. If the resources provided by each node are known to be different, as e.g. in any of the systems derived from Dynamo, then STONITH is not necessary.

To elaborate on that second point, the problem STONITH addresses is one of mutual exclusion. It might not be safe for the resource to be available through two nodes, because it could lead to inconsistency or because they can't both do a proper job of it simultaneously. As in other contexts, mutual exclusion is a useful primitive but often not the optimal one to use. In general it's better to avoid it by avoiding the kinds of resource sharing that make it necessary. That's why "shared nothing" is the most common model for such systems designed in the last decade or more, and they don't need STONITH unless they've screwed up by not fully distributing some component (such as a metadata server for a distributed filesystem).

Comments for this blog entry