I’m kind of itching to write about my new job, but I kind of promised I wouldn’t do that until I’d written something about the old one. Most of what I’ll have to say is negative, not because everything that happened at Revivio was bad but because – like many engineers – I tend to focus more on things that go wrong and can be fixed than on things that go right and don’t seem to require further attention. I’m sure somebody will be offended by something I say here. Oh well. It’s all but impossible to write about something that turned out so poorly without someone feeling that they’re being blamed, but blame is not my intent. What I hope is that those of us who were there will learn from our experience, and maybe someone else will learn too, and neither can happen if our mistakes remain unexamined or unremarked.

What happened at Revivio was basically attributable to three factors, in decreasing order of importance: strategy, organization, and technology. Strategy was by far the most significant. To put it simply, we chased the wrong market. Enterprise storage customers are unbelievably demanding. Many just won’t even talk to startups. At all. Ever. End of story. Others will talk . . . and talk, and talk, and talk, requiring endless justification and persuasion all up and down the management chain before you even get to the point where a normal sales cycle would start. Way too many times, our salespeople thought they were about to close a deal, then all of a sudden there’s a new key player who’s skeptical or outright hostile, and at best we’d have to start all over again. Even those who finally did sign often required amazing amounts of free support as part of the package, often solving problems that were absolutely nothing to do with us or our product, just to prove our mettle and our commitment. Then, the contracts were written so that even after they had formally accepted the product they could throw it back in our face for no reason whatsoever and get their money back. I don’t think our salespeople or execs were incompetent to accept such terms; they were just desperate, but we still got burned by those deals.

Another strategic factor was that our product was just too new. We not only had to develop the technology for what is now known as Continuous Data Protection, but we had to create an awareness in the industry that this previously-impossible thing was now real and beneficial to storage consumers. We’ve all heard about the so-called First Mover Advantage, but I’m a big believer in the Second Mover Advantage. It’s very rare in this industry for the real originators of an idea to be the ones who profit most from it. NetApp didn’t invent network storage. EMC didn’t invent RAID. Microsoft didn’t invent windowing systems, and neither did Apple. The very energy and resources required to develop and evangelize any new idea become unavailable to exploit it commercially, leaving those who bear that expense at a disadvantage compared to latecomers who can devote 100% of their energy and resources to exploitation. One reaps, another sows. Somebody will make a lot more money from CDP than Revivio ever did. Yes, it’s natural to ask how this theory applies to “Dense Cluster Computing” but I’ll leave that answer for another day.

The second set of problems at Revivio was organizational. I was not the first or only person there to remark that Revivio seemed to be more Balkanized and have more communication problems than any company its size ever should. Management was usually off in their own world of closed-door meetings between the same dozen people – none of whom had ever so much as installed the product – but that’s normal at tech companies. The problem I’m thinking of was down in the trenches. From the early days when development was split into three groups (platform, indexing, and system management) to the late ones when it was just two (work and play) there was an amazing tendency for groups or even individual developers just to go their own way without bothering to tell others who were affected. Most developers had only a tenuous relationship with QA, and none at all with CS. Requirements and completion criteria were generally missing, or so confusing and whim-driven that simple absence would have been a blessing. It was really sad. I was made product architect partly to solve these problems within the development organization, but at this point I’d have to say I failed. Yes, I did well enough that we managed to get a product out. However, there was only so much I could do. Without innate authority even to reject checkins on the basis of technical insufficiency or inconsistency, and with only lukewarm support from those who did have such power, there was not much I could do if a developer decided to thumb his (or her) nose at the product architecture or even basic software-engineering practice. Short of jumping in and rewriting code that we had already paid someone to write, all I could do was offer warnings or criticism that fell on deaf ears . . . until the predictable bugs and integration problems generated the predictable crisis. Did I mention that we spent way too much time in crisis mode? Anyway, I did sometimes manage to avert some of the worst disasters, but they were always a few raindrops in a downpour.

But enough about me . . . or maybe not, because I exemplify another part of the problem. Nobody ever seemed to have a $#@! clue who was helping or hurting the company. Some people who busted their humps trying to do the right thing consistently got no credit at all and eventually got laid off. Other people who spent half their time on unapproved pet projects and the other half schmoozing just coasted right along, and often seemed to benefit compared to if they’d been more diligent. Some of my own most crucial contributions were ignored, taken for granted, or even mocked. Some other things that I did were recognized and rewarded out of all proportion to the difficulty or sacrifice involved. Overall I admit that I was among those who benefited from this corporate inability to tell good from bad, not to the extent that anyone dropped a huge bonus on me while others were being laid off or anything like that, but still. Just because I was a beneficiary rather than a victim doesn’t mean I was blind to the problem, though. If there’s one thing that was fully under Revivio’s control and that they were really bad at, it was this issue of credit and accountability and motivation.

I’ve left the technical issues until last because, frankly, I think they contributed least to the negative outcome. Everyone who worked there knows we had major issues with the third-party software we were using, from its initial selection to now. I don’t want to get in trouble for describing technology that is now the property of Symantec, but I will say that I’m referring to all of the third-party software, not just the obvious culprits that people seem to have eliminated but also some that I know are still in use. I’ve known companies to be too afflicted with the “Not Invented Here” syndrome, but Revivio almost seemed to have its opposite. We had opportunites to build something more suited to our purposes, in some cases we expended significant resources on developing them, but in almost every case we succumbed to the old “bug fixes will be someone else’s problem” narcotic. Relatedly, we also had problems with multiple languages and paradigms within the system. Getting different parts of the system to talk, when each was written in a different language and with a fundamentally different view of the world, was often a nightmare. There were way too many communication paths as well, many of them bizarre and cumbersome by anyone’s estimation but their author’s. Way too many technical decisions were made based on who would be doing the work instead of what was best for the system, but that’s more a reflection of the already-discussed organizational problems than of anything truly technical. In the end, though, every day spent getting code to solve actual product-functionality problems seemed to require three more figuring out how to fit into the weird little ecosystem we’d made for ourselves. Here I share blame with the other senior technical staff. All too often, the senior person who had complained the loudest about a problem was the least helpful in finding or implementing a solution, so we just ended up with churn instead of progress.

What would I have done differently? First and foremost, I would have targeted a different market. We could have had a non-highly-available system with better performance ready a year sooner than the product we actually shipped, and it would have faced fewer barriers in the market. We could have had a hundred customers by spring 2006, instead of a dozen by fall. Then we could have begun work on the scaled-up version. I would have insisted on simple but firm requirements and exit criteria. I would have had not one but two architect-level people with real authority to ensure that what got produced matched what was needed and not the developers’ own preferences. One would have been the traditional “think about all the hard problems and draw the big boxes” kind of architect. The other would be like a super release engineer, responsible for code quality. That means staying on top of every checkin, immediately and peremptorily rejecting those that do not meet either formal or informal standards. It also means running code-analysis tools, analyzing defect trends, etc. It would be nice if these two functions could be performed by one person, but even in a small development organization it’s just too high a workload. The goal, not just in development but throughout the company, would be empiricism. Measurable results should matter more than personality or presence in the right meetings. That means measurements must be made, and people who measure up poorly given the opportunity to improve or leave. If Revivio had adopted that philosophy, most of the “old crew” would still be working at Hartwell Ave., finishing up the third generation of the product and planning what to do with all that money after the IPO later this year. We certainly had the technical talent to do it, and I think we probably had the business talent as well, but somehow all of that ability never crystallized into a team that could win. A few key decisions years ago, filtered through an organization that tended to magnify mistakes and attenuate success, ended up making all the difference.