(This started as a Hacker News discussion about an article on
Advogato. The articles title/premise is "Why You Need STONITH" where
"STONITH" means "Shoot The Other Node In The Head" and is an important concept
in old-school HA. I might even have been present when the acronym was coined,
after having used a similar one at CLaM.)
I was working on HA software in 1992. Specifically, I was working on the
software from which Linux-HA copied all of its terminology and basic
architecture. We ourselves were not the first, and often found ourselves
copying things done even earlier at DEC, so I'm not complaining, but I want to
make the point that this article from 2010 is actually a rehash of a much older
conversation. As cute as the metaphor is, it gets two things seriously wrong.
(1) Fencing and STONITH are not the same thing. Fencing is shutting off access
to a shared resource (e.g. a LUN on a disk array) from another possibly
contending node. STONITH is shutting down the possibly contending node itself.
They're quite different in both implementation and operational significance.
Using the two terms as though they're interchangeable only sows confusion.
(2) You only need STONITH if you have the aforementioned possibly contending
nodes - in other words, only if the same resource can be provided by/through
either node. If the resources provided by each node are known to be different,
as e.g. in any of the systems derived from Dynamo, then STONITH is not
To elaborate on that second point, the problem STONITH addresses is one of
mutual exclusion. It might not be safe for the resource to be available through
two nodes, because it could lead to inconsistency or because they can't both do
a proper job of it simultaneously. As in other contexts, mutual exclusion is a
useful primitive but often not the optimal one to use. In general it's better
to avoid it by avoiding the kinds of resource sharing that make it necessary.
That's why "shared nothing" is the most common model for such systems designed
in the last decade or more, and they don't need STONITH unless they've screwed
up by not fully distributing some component (such as a metadata server for a