I hate for my brain to be idle. I always have a few “background tasks” running, and during those in-between moments when I’m not actively thinking about something else I keep coming back to them – during my morning routine, while eating lunch or working out, just before I go to sleep, etc. Often these tasks will have to do with work. Sometimes they’ll be “rehearsals” of my next email or blog/forum post. Sometimes they’re just puzzles and conundrums that I’ve picked up here and there, or other stray thoughts. Most often, though, they have to do with my current non-work technical line of thought. Lately that has meant the code verification project I blogged about a few days ago. I’m converging on some more concrete ideas about that, but I don’t think I’m ready to put those into words yet (that background task is still running). Meanwhile, here are some thoughts about a related issue: code coverage.

There seem to be a lot of misunderstandings about code coverage. At first, many people think that if you’ve achieved 99% coverage you’re in really good shape. When pressed, those people will admit that a single line that you missed could still contain a disastrous bug, so if you have a 100K lines of code that’s still a thousand possibilities for something really horrible to happen. If you press further you might get an admission that even 100% coverage doesn’t mean the code is bug-free, but the reasons are usually kind of vague. “Well,” someone might say, “the code might perfectly express what you told it to do, but you might have told it to do the wrong thing.” The more astute might notice that if the code and the unit test are both deficient in the same way then you can get 100% coverage even though the code only does half of what it’s supposed to. That’s why some people suggest having different people write the code and the unit tests, to reduce the probability that the same blind spot will afflict both. Those are good observations, but I think they pale in comparison to the most common reason why 100% coverage is only a starting point: combinations of code matter. Ten pieces of code might each be correct in themselves (for some value of “correct”) but if they combine in a hundred different ways you need to ensure that all hundred combinations behave correctly. Failure to consider this is in my opinion the #1 reason software in general is so fragile. In case anyone’s wondering what I mean when I talk about combinations of code, click through and consider the following fragment of code.

void * thing;

void *
AllocateIt (void)
{
    thing = malloc(THING_SIZE);
    return thing;
}

void
FreeIt (void)
{
    free(thing);
    thing = NULL;
}

It’s trivial to create a unit test that will achieve 100% coverage for this:

void
UnitTest (void)
{
    AllocateIt();
    FreeIt();
    // Whatever you want to check the final state.
}

That unit test completely fails to capture some of the most likely and disastrous and interesting bug possibilities. If someone calls AllocateIt twice in succession we’ll leak memory; if they call FreeIt twice (or once without calling AllocateIt first) we’ll probably crash. Those who have used modern static code-analysis tools will already have noticed that they’re likely to detect both of these bugs because they can figure out that both result from failure to check preconditions. Thus, you had 100% code coverage before but code analysis revealed bugs — real, serious bugs, not theoretical or trivial ones — anyway. That “100% coverage” turned out to be pretty meaningless, didn’t it? Remember, that’s just two trivial functions. Make them more complex, and put them in a context of many more functions calling each other in myriad ways, and you can still have 100% coverage but hundreds of serious bugs. Coverage (as traditionally measured) is just the beginning, not the final goal.