This post is actually about an anti-pattern, which I’ll call the “grazing” pattern. Code wanders around, consuming new bits of information here and there, occasionally excreting new bits likewise. This works well “in the small” because all of the connections between inputs and outputs are easy to keep in your head. In code that has to make complex and important choices, such as what response to a failure will preserve users’ data instead of destroying it, such a casual approach quickly turns into a disaster. You repeatedly find yourself in some later stage of the code, responsible for initiating some kind of action, and you realize that you might or might not have some piece of information you need based on what code path you took to get there. So you recalculate it, or worse you re-fetch it from its origin. Or it’s not quite the information you need so you add a new field that refers to the old ones, but that gets unwieldy so you make a near copy of it instead (plus code to maintain the two slightly different versions). Sound familiar? That’s how (one kind of) technical debt accumulates.

Yes, of course I have a particular example in mind – GlusterFS’s AFR (Advanced File Replication) translator. There, we have dozens of multi-stage operations, which rely on up to maybe twenty pieces of information – booleans, status codes, arrays of booleans or status codes, arrays of indices into the other arrays, and so on. That’s somewhere between one and ten thousand “is this data current” questions the developer might need to ask before making a change. There’s an awful lot of recalculation and duplication going on, leading to something that is (some coughing, something that might rhyme with “butter frappe”) hard to maintain. This is not a phenomenon unique to this code. It’s how all old code seems to grow without frequent weeding, and I’ve seen the pattern elsewhere more times than I can count. How do we avoid this? That’s where the title comes in.

  • Gather
    Get all of the “raw” information that will be relevant to your decision, in any code path.
  • Prepare
    Slice and dice all the information you got from the real world, converting it into what you need for your decision-making process.
  • Cook
    This is where all the thinking goes. Decide what exactly you’re going to do, then record the decision separately from the data that led to it.
  • Eat
    Execute your decision, using only the information from the previous stage.

The key here is never go back. Time is an arrow, not a wheel. The developer should iterate on decisions; the code should not. If you’re in the Cook or Eat phase and you feel a need to revisit the Gather or Prepare stages, it means that you didn’t do the earlier stages properly. If you’re worried that gathering all data for all code paths means always gathering some information that the actual code path won’t need, it probably means that you’re not structuring that data in the way that best supports your actual decision process. There are exceptions, I’m not going to pretend that a structure like this will solve all of your complex-decision problems for you, but what this pattern does is make all of those dependencies obvious enough to deal with. Having dozens of fields that are “private” to particular code flows and ignored by others is how we got into this mess. (Notice how OOP tends to make this worse instead of better?) Having those fields associated with stages is how we get out of it, because the progression between stages is much more obvious than the (future) choice of which code path to follow. All of those artifacts that lead to “do we have that” and “not quite what I wanted” sit right there and stare you in the face instead of lurking in the shadows, so they get fixed.