Code Coverage

I hate for my brain to be idle. I always have a few “background tasks” running, and during those in-between moments when I’m not actively thinking about something else I keep coming back to them – during my morning routine, while eating lunch or working out, just before I go to sleep, etc. Often these tasks will have to do with work. Sometimes they’ll be “rehearsals” of my next email or blog/forum post. Sometimes they’re just puzzles and conundrums that I’ve picked up here and there, or other stray thoughts. Most often, though, they have to do with my current non-work technical line of thought. Lately that has meant the code verification project I blogged about a few days ago. I’m converging on some more concrete ideas about that, but I don’t think I’m ready to put those into words yet (that background task is still running). Meanwhile, here are some thoughts about a related issue: code coverage.

There seem to be a lot of misunderstandings about code coverage. At first, many people think that if you’ve achieved 99% coverage you’re in really good shape. When pressed, those people will admit that a single line that you missed could still contain a disastrous bug, so if you have a 100K lines of code that’s still a thousand possibilities for something really horrible to happen. If you press further you might get an admission that even 100% coverage doesn’t mean the code is bug-free, but the reasons are usually kind of vague. “Well,” someone might say, “the code might perfectly express what you told it to do, but you might have told it to do the wrong thing.” The more astute might notice that if the code and the unit test are both deficient in the same way then you can get 100% coverage even though the code only does half of what it’s supposed to. That’s why some people suggest having different people write the code and the unit tests, to reduce the probability that the same blind spot will afflict both. Those are good observations, but I think they pale in comparison to the most common reason why 100% coverage is only a starting point: combinations of code matter. Ten pieces of code might each be correct in themselves (for some value of “correct”) but if they combine in a hundred different ways you need to ensure that all hundred combinations behave correctly. Failure to consider this is in my opinion the #1 reason software in general is so fragile. In case anyone’s wondering what I mean when I talk about combinations of code, click through and consider the following fragment of code.

More Fluff

Today’s “awww….” picture of the day comes to us courtesy of the Boston Herald. Check out Beau T. Giraffe.

Twisted Titles

I’m sure I’m not the first parent to get tired enough of certain children’s books to begin thinking of ways to “spice them up” a little. Here are some of the hypothetical titles I’ve come up with so far.

  • Richard Scarry’s Gang Headquarters
  • Thomas the Tank Engine Goes to Iraq
  • Put Down That Chainsaw, Amelia Bedelia

Any others? Let’s try to keep this no worse than R-rated, because “Busy Busy Bordello” and variants are just too easy.

Unified Verification

As I’ve hinted a few times before, one of the projects I’ve really wanted to pursue for a while is a new kind of program to verify code correctness. Better tools to find bugs after the fact are all well and good, but it’s even better if you can find bugs before the code even runs — especially if it’s done as part of an automated process such as a nightly build. Two kinds of checkers currently exist to serve this need.

  • Protocol checkers such as Murphi are really good for detecting errors in high-level logic, including logic that involves a lot of concurrency. The drawback is that they work with an algorithm expressed in their own special-purpose language instead of real code. Since there’s no connection between the version of a protocol that you’ve verified and the one you’ve actually implemented, bugs can still creep in. I once implemented my own protocol checker largely to address this, but it was actually a pretty lame attempt.
  • Code checkers such as the Stanford checker (later commercialized as Coverity’s Prevent and Extend products) or Smatch can be thought of as “lint on steroids” but that description doesn’t really do them justice. The code- and data-flow analysis they do can actually be very sophisticated; equipped with the right information about when transitions occur and which are valid, they can uncover many kinds of bugs — in real code — that lint never dreamed of. Weaker examples of the breed analyze only within a function, while stronger ones can analyze between functions as well … but still only within a single continuous code path, and that’s the drawback to this class of tool. My favorite example of when this is insufficient involves code that allocates a structure before it sends a network request and then frees it when the reply comes back or a timeout occurs. This class of tool cannot distinguish between the legitimate case where the structure gets freed in the reply/timeout path (which is in effect a separate execution of the program) and the bug case where it never gets freed at all. Similarly, it can’t deal with control flows that jump between multiple concurrent threads.

OK, enough background. What do I propose to do about any of this?

RAD Lab

Somehow, despite all of the technical-news sources I’m plugged into, I ended up reading about one of the day’s most interesting developments in the Boston Globe. Apparently Google and Microsoft have combined to fund the Reliable Adaptive Distributed systems Laboratory, staffed by folks from UC Berkeley. It’s interesting that Google and Microsoft are cooperating on anything, but to me the list of people involved from Berkeley is more interesting. David Patterson should need no introduction — co-inventor of RISC and RAID, co-author of the best books ever on computer architecture, Berkeley CS and ACM SIGARCH chair, winner of more awards for both research and teaching than I can list here. I seriously cannot think of anyone in computing who has earned greater credibility or respect, and by the way he’s a heck of a nice guy too. I recognize most of the other faculty from the Tahoe retreats that I was privileged to attend a few years back, and they’re all pretty high-calibre as well. This is a group worth watching.

Bug of the Day

Yesterday I chased down a bug that turned out to be caused by code approximately like the following.

lock();
for (;;) {
    req = find_completed_request();
    if (req) {
        unlock();
        return req;
    }
    unlock();
    wait_for_next_completion();
    lock();
}

The problem with this code is the case when it’s called with no requests outstanding. Since none are outstanding, none will complete, and there’s no guarantee that any new ones will be issued — and in the corresponding real code none would. Even more annoying was that the wait function (within the Linux kernel) was implemented in a way that would stop the debugger cold when it tried to attach to the process, so now I’d have a hung debugger and still no way of seeing where the process was stuck. Fortunately I had enough context to lead to this function and I’m pretty good at finding bugs by inspection.

This particular anti-pattern is one I’ve encountered many times before, to the point where it would probably make my Ten Least Wanted list. What’s particularly interesting about it is that it’s an example of a bug that static code checkers are very nearly helpless against. (Yeah, I ended a sentence with a preposition. Just try to rewrite that sentence so I don’t … or just get over it.) Firstly, the code checker would need to understand in general about multiple code paths (or instances of the same code path) executing simultaneously, and I’ve yet to see one that does. Protocol checkers are good at that, but this is a code-level problem. Secondly, the code checker would need to understand enough about the possible sequences of calls to recognize this particular indefinite-postponement problem without raising false alarms about similar cases where a request does in fact complete and end the wait. As it turns out, I have some ideas about how to fuse protocol-level and code-level checking to address this kind of thing, and I’ll be writing about those soon, but for now I’ll just gripe about people who unintentionally create infinite loops because they didn’t properly consider whether those loops’ termination conditions would always be met.

By the way, refactoring the above fragment so it doesn’t exhibit this problem is left as an exercise for the reader.

Sensitive Teeth

I’m mondo busy right now, so I haven’t been and probably won’t be writing much, but here’s a quick science tidbit. Apparently narwhal tusks are sensory organs that might detect such things as salinity and barometric pressure. It’s not quite as weird as electroreception but it’s still pretty cool.

British Cook Syndrome

Yet another side topic that has come up in recent conversations is that of code layering. In a way, it’s amazing that the idea would even need to be discussed with anyone past about their sophomore year in CS, but apparently it does. I wouldn’t expect anyone to say that modularity is stupid, or that “divide and conquer” was a dumb idea, but someone actually did trot out the old “layering is for cakes, not software” brainturd the other day. Leaving aside that I prefer unlayered cakes, I responded thus:

It’s a good thing there are smarter people than you, who don’t think in simplistic slogans. Collapsing layers is a common trick in the embedded world, to wring every last bit of performance out of a CPU- or memory-constrained system at cost of adaptability. In a system that is constrained by neither CPU nor memory and where future extension should be expected, it’s a bad approach. If networking folks had thought as you do, it would be a lot harder to implement or deploy IPv6 or the current crop of transport protocols, not to mention packet filters and such that insert themselves between layers. In other words, it would be harder for technology to progress. In storage that same story has been played out with SCSI/FC and PATA/SATA/SAS.

Anybody who has done serious work in networking knows that network-layer and transport-layer expertise are rarely combined in one person, and that there are several other areas of expertise both above and below those two. In storage a good team will often have some people who know the low-level Fibre Channel glop and others who know the (slightly) higher-level SCSI-command-set stuff. There are HBA-driver developers and generic-disk-driver developers and volume-manager developers and filesystem developers, again with more above and below. Why? Because there’s enough complexity in any one of these areas to keep even a sharp mind fully occupied. One of the most important aspects of layering, besides its obvious necessity in making it possible to test software properly, is that it allows each area to be covered by an expert in that area. It allows all of that expertise to be combined effectively not only within an organization but across organizations as well. It’s like that old joke:

Is Sun the new DEC?

Even as I was defending ZFS in my previous post, some of my disagreements with folks at Sun seem to have flared again, and I got to asking myself the question in the title. Back in the day, Digital Equipment Corporation was quite an interesting beast. On the one hand, they had a lot of really cool technology and a tremendous capacity to create more. I’ve often sung the praises of work done at DEC during that era, and to this day consider experience there to be a significant plus on a resume. On the other hand, DEC always had a problem getting actual revenue-generating work out of many of its employees. For one thing, many old-timers were notorious for seeming to spend all day conversing on mailing lists and via VAX Notes instead of designing, coding, and debugging. Furthermore, a lot of people even in mainstream engineering groups seemed to be involved in projects with little or no commercial potential.

I’m not against research and innovation, mind; I just think it should be done by people officially dedicated to it full time, using schedules and metrics and such different for those used for product development. I think companies should define a resource level for research, I think that level should be higher than is de facto the case at most companies today, and I think people should rotate between research and development groups. One of my pet peeves at EMC was that they didn’t know how to handle research and so basically stopped doing any. However, none of that means that people currently assigned to a product-development group should try to make up for any such lack themselves by haring off on every project that catches their fancy. The only thing worse than individuals doing it is when people who are supposedly in a leadership position in mainstream engineering do it and drag whole teams along with them. If they want to strike out in that kind of a direction they should seek an assignment to a research group where the project can be managed appropriately. If the idea pans out it can be fed into engineering where it can again be managed appropriately – this time toward driving revenue.

Fun With SSH

Today I worked from home because of the weather. Fortunately, Revivio is very well set up for this. Almost everything in the lab is set up with remote consoles and even remote power, and most of the connections we really care about are through switches that also allow remote management so we can simulate most kinds of failures just by turning ports on and off. Yes, we can and do even script such things for testing. As a result, most people can go for weeks without needing to step into the lab. Add the ability to connect from home, and the result is that it’s usually possible to do from one’s living room or home office things that at most companies would require physically sitting in the lab. It’s quite nice.

The situation is not quite perfect, though. Revivio does have a VPN, which I use from Windows quite happily for short stints. However, for serious programming I prefer to be in Linux, and I really don’t feel like installing and configuring the VPN under Linux (less well supported by our IT staff) just to do a couple of things that could be done other ways. Here are a couple of more interesting examples, using ssh.