My last technical article was about debugging, because that’s what I had been doing a lot at the time. Now I’m going to discuss rewriting code, for similar reasons. It’s no secret that the code for any multi-person project is likely to represent different levels of quality. Regardless of whether the median quality is stellar or abysmal, the road to improvement often means picking one of the components at the bottom and either eliminating the need for it or overhauling it until its quality is at or near the top. Because a chain is only as strong as its weakest link, even a few iterations of this process can yield substantial improvements in overall product quality. Much of the advice I’ll give applies also to writing new code, but the emphasis is a little different and plenty of books have already been written on that topic so I’ll focus on the areas that are of particular interest when rewriting old code.

The first part of rewriting code is deciding whether you should do it at all. As Joel Spolsky and others have written quite eloquently, it’s often harder to read others’ code than to write your own and rewriting carries an inherent risk of reintroducing bugs that were already fixed in the older version. I don’t agree with Joel that you should never rewrite, but caution is warranted. In my experience there are two kinds of people who rewrite code: immature programmers who do it eagerly to prove they’re smarter than the last guy and who almost invariably do it wrong, or older programmers who do it reluctantly because it’s demonstrably good for the product and who tend to get it right. Rewriting code because it’s easier than understanding it as it is, or letting others rewrite it for that reason, is almost guaranteed to be a serious mistake. Therefore, my rule of thumb is that you should never rewrite code that you don’t already understand well enough to feel confident fixing bugs in it without a rewrite.

That leads very neatly into my first real piece of advice: know the code’s history. As you study someone else’s code, you’ll inevitably find pieces that don’t seem to make sense. Some of them will be artifacts of the previous author simply not thinking the same way you do. Some will be remnants of old experiments or requirements. Some will be fixes or workarounds for bugs that you haven’t thought of yet, and those are the most important. To know which is which, look at the revision history. You certainly don’t have to look at every single line in every single revision, but most source-control systems nowadays have a way to identify when each line first got added to a file. Use that to find out when each strange-looking piece of code first appeared, then examine appropriate context (e.g. bug reports, old email) to understand why. This will not only help you understand the code, but it will also give you a head start on developing unit and regression tests.

The next thing you should do, like a surgeon, is prepare the operating area. With code, this means narrowing the interfaces between the component you’re about to rewrite and the rest of the system. If you reduce the number of calls/messages or arguments at the boundaries, you’ll have fewer things to check each time you rearrange what’s inside. This “preparation phase” is also a good time to do all of your generic cleanup work such as reindenting or making all of the variable names consistent. Stick to lightweight or superficial changes at this point, but you might as well make things more readable for the next couple of phases.

The last piece of preparation is adding layers. More layers are better at this stage; you can always collapse them back when you’re done. If you have a function that operates on a collection of objects, break it apart into a function that operates on only one object and a function that just contains the loop over the collection. If you have a function that does three exchanges with another component, break it into one function for each exchange plus one to tie them together. One particular thing to look for and eliminate is redundant code. Lie and Engler have observed that dead or redundant code is often indicative of bugs, and in my experience this is particularly true of code that was obviously cut and pasted several places. Even if the repeated fragments of code don’t seem to match neat conceptual boundaries, create functions to contain them in one place apiece and call those functions from elsewhere instead of repeating the code. Redundant data structures deserve the same scrutiny and treatment as redundant code at this point.

Now you should have code that you understand pretty well and which is in a fairly readable form. This is a good time to step back and make a specific plan for what you’re going to change. All of your familiar design methodologies and instincts apply here, so use them to develop a new picture of how the component should work. One particular thing to look for is special cases in the old code. Special cases accumulate over time to deal with bugs or requirements from late in the code’s life cycle, often contrary to its original design. This is a sign that the original design was flawed or incomplete, and here’s your chance to fix it. In general, the number of special cases in a “mature” component can be reduced by at least half with a design that reflects the domain experience you now have. Yes, that means the new code will probably require fewer lines to do at least as much as the old; if that’s not the case, you’re probably looking at a rewrite that should never have happened.

With a plan in hand, now you can do the real work of rewriting. I find that the best approach is to work from the external interfaces inward. Take one externally callable function and re-express it in terms of the data structures and primitive functions that resulted from your earlier preparation. Keep the old version around for reference, and particularly to keep an eye out for the edge conditions that the new version must handle as well as it did. Test the new function thoroughly, then throw out the old version. Lather, rinse, repeat. When you run out of externally callable functions, look at internal functions such as those called periodically from worker threads, and give them the same treatment. Keep going until the new code matches the vision developed in the last step.

The second to last part of a good rewrite is to undo some of your earlier prep work. Remember when I said you could always collapse extra layers later? Do it now. If a function in one layer only ever gets called from one place, and you can only really imagine it ever being called from that one place, re-inline it. Reexamine all of your data structures. Chances are good that at least a few objects/fields aren’t used at all any more, and others can be trivially eliminated or rearranged into a more rational order.

Finally, you’re ready to wrap up. Go through the code looking for all of the notes you left for yourself during earlier stages. You were keeping notes, weren’t you? I find that there are always dozens of things I find that I want to get back to later, and I just mark them with “TBD” comments. Before you declare victory, you should do one last scan to make sure return values are always checked appropriately, memory is always freed properly, and logging is neither excessively verbose nor excessively terse. Likewise, comments should be present in appropriate numbers and should reflect the new reality of the code. When you’ve done these code equivalents of sanding and staining and putting away your tools, you’re really done. Congratulations on a rewrite done the right way.