Broken Web Forums

The next time I find a web message board that glitches out when I try to post, and then – here’s the important part – throws away my text when it gets the error so I have to start all over from scratch, I swear I’m going to DoS both the forum site and the site for whoever wrote the software. Today’s offender is the Manifest Destiny Forum, but there’ve been many others. BTW, Ron Hiler comes across a total Nazi, constantly posting “that’s just the way it is” kinds of responses and moderating just about every section/thread in the forum. As a (predictable) result, all of the posts seem to have a similar “Civ sux, MD roolz” flavor; I don’t know why I even wanted to bother posting in such an environment.

Yeah, I know, it’s a common enough problem that I should always save my text in an editor window somewhere before I hit the “submit post” button, but sometimes I forget, OK? That’s no excuse for wasting the time I took to write what is very often a thoughtful and detailed post.

Formal Validation

All this talk about bug free code is kind of amusing. While I am unhappy about the number and type of bugs found in Civ 3, I do realize that bug free code is usually impossible. There are ways to prove a program mathematically correct, called formal methods I think, but these proofs become exponentially expensive to perform and probably are not practical for game development or any commercial software for that matter.

Exactly correct. Such programs are called verifiers, validators, or model checkers. I’ve written one, and used a couple of others. There are several major problems with the entire genre, including but not limited to the following:

  • As you point out, the state space grows very quickly to the point where the time required for validation exceeds the lifetime of the universe. It happens sooner than you think. State space explosion is the #1 problem in this field, and quite a few brilliant researchers have spent their whole careers trying to address it…sadly, to little practical avail.
  • Successful validation only proves that the program meets the requirements as given to the validator as input, and that input can have its own bugs.
  • Both the algorithm and the requirements must usually be specified in a validator-specific language, which inevitably means that what you’re validating is not the actual running code. The only system I know of that applies serious validation techniques to real code is Dawson Engler’s MC, and that technology is still in its infancy.

I apologize for interjecting a little software-engineering reality into what is obviously a very satisfying (but uninformed) pig-pile, but it just seemed necessary. The perfect is the enemy of the possible, and I’m really big on exploring the possible. Maybe we can get back to something related to games now, instead of pretending to be software-engineeering experts.

My own validator, to which I referred above, is described here.

Bug Prioritization

If Firaxis knew that e.g. Air Superiority was broken then IMHO that is unprofessional of them to release it. This is a “showstopper” bug IMHO because the software is not conforming to specification (i.e. it is not behaving as the game rules suggest).

No, that’s not a show-stopper. Nobody’s going to die from an air-superiority bug in a game, and applying medical-equipment or flight-control standards to a game is a false analogy. We don’t know if fixing the bug might have caused Firaxis to miss a contractual deadline, which would have “stopped the show” in a much more concrete way.

BTW, have you ever seen any software that conformed to the spec in every respect no matter how trivial? I’ve seen a lot of software in my over-a-decade in this industry, and I don’t think I’ve ever seen any that met that unreasonable standard. Half the specs I’ve seen were self-contradictory in some subtle way or other (specs have bugs too), so it wouldn’t even be possible. That’s not a workable definition of “show stopper”; such a definition must necessarily include impact/severity, frequency of occurrence, cost to fix, damage to reputation if not fixed, maybe even professional pride, but it’s never as simple as people here seem to think.

Being Bill Gates

Oh, and as to the guy who said I shouldn’t have Bill Gates’ job? If I did, your software would probably be many, many times more reliable.

…and it would also be many years from completion, not benefiting a single user anywhere. Actually no, it wouldn’t, because your version of Microsoft would have gone bankrupt years ago and you’d be looking for a job in some other industry. To adopt your own phrasing, there is NO EXCUSE for any software developer to be so hung up on self-delusional no-bugs grail quests as to interfere with the business goals of their employer. NONE. And that’s true in every other industry too.

Developer Responsibilities

It is humanly possible to produce perfect code. I have seen it done, and I have done it myself. It takes talent, dedication, focus, testing and a committment to excellence – but it CAN be done.

It’s possible, but the time required to write a non-trivial program and verify it to be bug-free exceeds the market lifetime of the program rather sooner than some people seem to think.

The employee you want is one who is committed to the goal of producing bug-free software. The employee you don’t want is the one who thinks that buggy software is acceptable.

I disagree. The employee you want is the one who is committed to the goal of returning maximum value to your company. That usually goes hand in hand with minimizing bugs, but not always. The obvious counterexample is when trying to kill that last bug will cause the ship date to slip past a hard deadline and result in a contract being voided. Other less extreme examples are possible too, but I hope the point is clear enough.

Ahem. I have written software, in about 9 different languages. I have written DRIVERS – heck, you might even be using one that I wrote.

Yeah, yeah. I’ve written drivers and other kinds of code that’s more difficult than drivers. For three years I specialized in high-availability clustering. I’ve been paid to write as an expert on how to produce quality software. Despite all that, I admit that every non-trivial program I’ve written has had bugs – despite every validator I’ve run it through, every unit test I’ve written, and every month that highly skilled QA people (or customers) have spent torture-testing it. Any programmer who doesn’t make a similar admission is simply deluding themselves and attempting to mislead others. Even SEI capability-level 5 shops (of which there are two in the world) have produced code with serious bugs, with serious consequences.

Bug-free software can be written; it’s been done. At the very LEAST it can be what you constantly strive for. Anyone with the attitude that bugs are not only inevitable but acceptable is striving not for excellence but for mediocrity.

On this we agree. Even though it is an almost provably unattainable goal, bug-free software should nevertheless be a goal. “Bugs are inevitable” should not be used as an excuse. If a bug is identified, the proper response is “oops, we’ll fix it”. If the bug “should have been” found and fixed sooner – subject to reasonable standards of skill and diligence, plus constraints of business survival – maybe there should be an apology as well. “Stop whining” from a developer or apologist is just as counterproductive and inexcusable as “you suck” from the person who found a bug. If you ask me, we’ve been seeing far too much of both on this thread.

Several Posts About Bugs

Last night I ended up making several posts to a thread on Apolyton – a site generally devoted to Civilization and similar games – about bugs in Civ3. The conversation had gotten to be about general software-engineering, especially the possibility of writing totally bug-free code and the acceptability of shipping products with bugs. Since there are multiple posts, some quite long, including them here in their entirety would overwhelm the log you’re reading now, so I’ve put them behind links; here are the links and synopses:

Current State of Linux for the Enterpris

I haven’t posted anything about Linux for a while, so I figured I’d bring this over. The original thread is here.

Linux is as ready for the enterprise as any other offering (including those already considered to be enterprise platforms).

Wrong. It’s certainly making good progress, but it’s still quite deficient in several important areas.:

  • Support for truly large block devices, or truly large numbers of devices, still lags behind most of the commercial UNIXes.
  • The SCSI stack is still a mess, lacking features, robust error handling, and overall coherency. FC drivers aren’t in great shape either.
  • The journaling filesystems available for Linux are still relatively immature.
  • The VM system is effectively only a couple of months old. We don’t really know how it will perform on many types of systems, except that it will be *horrible* on NUMA machines.
  • Linux’s error logging and general RAS functionality is still nothing like what’s provided by the commercial UNIXes.
  • High-availability clustering does exist for Linux, but at a level roughly equivalent to what AIX had in ’95 and most others by ’98 or so.

That’s far from an exhaustive list, of course. As I said, it’s making good progress, and if you’re comparing it to any flavor of Windows then it looks pretty good. In the real world of the enterprise, though, it’s just not there yet.

Breaking My NDA

OK, I’m probably breaking all sorts of rules here, but I just had to tell people about EMC’s newest product: Pocket Symmetrix. Our motto: Big Storage in a Small Package. How’d you like to keep your MP3 collection on this little baby?

The Spaghetti Machine

So you think you have a big rat’s nest of cables under your desk? Here’s mine. What you’re seeing, besides the obvious PC, is a four-node Linux cluster. Each of the mysterious-looking black boxes stacked next to the PC contains two nodes; each node contains I can’t tell you and three fast Ethernet ports. All of those ports feed into the 16-port switch hiding in the background using the shortest cables I could find, and it’s still a horrible mess.

I’ve decided to call the nodes meatball, tomato, garlic and oregano.

Slime Volleyball

It’s impossible to describe, other than to say it’s a Java-based game: Slime Volleyball.