Dumb Things People Say

So good old @dozba, in his typically endearing style, wrote about why MacOS X is an Unsuitable Platform for Web Development. His criticism about Textmate seems a bit silly, but his rants about package management remind me of my own problems with a proliferation of language-specific package systems, and “you don’t deploy to BSD” is spot-on. You would think that people would realize that developing on one platform for deployment on another can be problematic at best, but apparently the lesson needs repeating. At work, we even see glitches when people develop RHEL stuff on Fedora or vice versa, and those are both far more similar than the Mac is to anything you would ever deploy in production. What was even more amazing to me was the Slashdot reaction. Even knowing – all too well – how far Slashdot has declined, some of the stupidity on parade there literally made my jaw drop. Let’s review the main point here, OK?

Developing and deploying on different platforms is only OK if you stay well inside some kind of insulated sandbox and don’t care about performance.

The first part, about sandboxes, is mostly about avoiding functional differences between the platforms. If you ever use platform-specific configuration info or ioctls or anything else that requires an additional abstraction layer to make the code even run on both platforms, then you’re on very thin ice. Nine times out of ten you’ll be giving the people who actually deploy the code on real systems a good reason to hate you. As for performance, there are two sub-points. If you’re running in that nice insulated sandbox of a JVM, then its performance warts with regard to thread and memory handling will probably outweigh the warts in the OS scheduler and virtual memory systems. To a lesser extent the same might be true if you’re using Python or Ruby or Erlang. If the performance that matters is network or storage (filesystem/LVM/disk) performance, then you won’t even have that excuse. You just can not expect two operating systems from completely different families to exhibit the same balance and inflection points at different I/O rates, sizes, thread counts, etc. and all of the knobs for tuning around these idiosyncrasies will be different too. If performance matters at all for your application, you need to be working with the same performance ratios and tuning facilities across development and production. Does the Slashdot crowd seem to get any of this? Of course not.

Of course, you can dual-boot Linux on it or run it in VMWare. But you knew that, right?

Couldn’t be more wrong. If you’re doing anything virtual, then you’ve just made the matching-environment problem even worse, because now you’re subject to differences in both the host and guest OSes (and the hypervisor between them). Also, if you’re so addicted to running stuff in virtual machines then you’ll probably have a lot of them contending with one another and distorting the performance picture even more.

I can’t imagine writing code so finicky and unstable that it can only be cajoled into running under such a specific environment.

This guy has obviously never written anything but toy code. In the real world it’s very easy for code to run well on one platform, run poorly on others, and not run at all on still others. All you need to do is use one platform-specific feature, or rely on one platform-specific aspect of performance to guide your implementation tradeoffs. It’s writing code that runs on many platforms that can be a challenge, and code that runs well on many platforms is quite rare.

OK, so picking on Slashdotters is like shooting pithed fish in a very small bucket. The real point, though, is that all those developers using their MacBook Pros to develop code that’s supposed to run on Linux are doing their colleagues and users a great disservice. You don’t like the Linux environment as much? Too bad. Make it better. That’s how your deployment platform got to the point where you’re using it, after all. People used it, mostly liked it, and fixed/replaced the things they didn’t like so much, and added new stuff to scratch their own itches. If you use the same platform for development and deployment, then every improvement you make has twice as many chances to benefit someone else, and every improvement someone else makes has twice as many chances to benefit you.

Entreposeurs

Even before dot-bomb, I’d learned that “entrepreneur” meant two distinctly different things to two distinctly different groups of people. To one group, it meant an activity; to the others it meant an identity. It’s the difference between doing what entrepreneurs have historically done (mostly inventing things) and being like entrepreneurs have historically been – or, even more cynically, appearing like they have historically appeared. Having spent more time in startups than the vast majority of those who are or call themselves entrepreneurs, I feel a great deal of kinship with the first group. The second group? Not so much. These are the folks who come across as MBA types even if they actually have technical degrees (and sometimes non-trivial technical abilities). They dress like they believe entrepreneurs should dress, whether that fashion calls for suits or pure grunge or some blazer-and-jeans hybrid in any given month. They eat at the restaurants and drink at the bars that they think entrepreneurs are supposed to. They read the books and blogs that they believe entrepreneurs should read, they put in the hours that they believe entrepreneurs should put in (even if those hours are mostly on Twitter), they gravitate toward the office locations and layouts that they believe entrepreneurs should prefer. It’s like the old saying:

Sincerity is the key. If you can fake that, you’ve got it made.

Individuality is valued, so these folks try to cultivate it . . . but they all cultivate it the same way and end up looking more conformist than ever. Nowhere is that more apparent than in their adoption of entrepreneur jargon. Some of that jargon, such as that associated with VC/angel term sheets and such, actually has some grounding in reality. Other terms don’t, and the ones that have really been annoying me of late are “pivot” and “iterate”. I know all too well how important it is for a startup to be adaptable, but pivots and iterations are still ways of recovering from mistakes. They’re still something to avoid, not something to strive for. On average, the interval between such “about face” maneuvers should be longer than the interval between founding and whatever exit you prefer. It’s nice when a figure skater recovers from a botched landing, but if they spend their whole routine proving that ability then they’re not going to win a lot of titles. Somehow, though, many entreposeurs seem to think that being able to jump from one product idea to another is more important than being smart enough not to chase such ephemeral pseudo-markets, or executing quickly enough not to miss the boat on the first idea, or having the courage/honesty to admit that you goofed and move on to the next thing. Failure is not supposed to be a liability, right? We learn from failure, if you’re not willing to fail you’re not thinking big enough, yadda yadda . . . or maybe that was last year’s management-consultant money machine, because this year it’s apparently preferable to keep flailing around for something – anything! – that will keep the founders from having to get new business cards. After all, “founder of X” is their identity, couldn’t possibly let go of that now, could we? The irony is that I will almost certainly get comments from people who say the value is all in the team anyway, even though when the issue is inflated social-media-startup valuations those very same people will immediately forget the team and justify their paychecks solely by making up stuff about potential markets. It’s over here, no it’s over there, why cheat people out of dollars with three-card monte when you can play the same game for millions in Silly Valley?

Here’s a hint: don’t try to be an entrepreneur before you’ve generated any wealth for anybody. It doesn’t work. The way to be an entrepreneur is to have an idea you’re willing to stick with, and then build a business around it. If you have to put aside the starving-artist chic and actually work in the corporate trenches until you have some clue which ideas are actually worth pursuing (and haven’t already been done), so be it. That’s how every real entrepreneur I’ve known did it.

Dealing With Distributed State

At the Linux Foundation’s recent End User Summit, I had the pleasure of meeting K.S. Bhaskar from FIS. Recently he wrote an article on his blog about Eventual State Consistency vs. Eventual Path Consistency in which he has some particularly interesting things about different kinds of consistency guarantees.

there are applications where detection of colliding updates does not suffice to ensure Consistency (where Consistency is defined as the database always being within the design parameters of the application logic – for example, the “in balance” guarantee that the sums of the assets and liabilities columns of a balance sheet are always equal).

He then gives an example showing an apparent problem with two financial transactions and their associated service charges, across two sites while a service-charge rate change is still “in flight” between them. I originally responded there, but my reply seems to have disappeared. Maybe it got lost due to a conflict with a subsequent update. ;) In any case, I might as well respond here because I think his example highlights an important issue. I don’t think Bhaskar’s example really demonstrates the problem he had described. In the last step he says that

B detects a collision, since it observes that the data that was read by the application logic on A to compute the result of transaction P has changed on B

How could B observe such a thing? Only if it knew either the data that was read on A (i.e. the service-charge rate in effect for the transaction was included as part of the replication request) or the exact replication state on A at the time P was processed there (e.g. by using vector clocks or similar). Either way, it would have enough information to replicate the transaction in a consistent fashion.

The real problem would be if B didn’t know whether or not the rate change had reached A yet when P was processed there. That would result in B needing to distinguish between two possible states that would have to be handled differently, but with no way to make that distinction. The general rule to avoid these kinds of unresolvable conflicts is: don’t pass around references to values that might be inconsistent across systems. It’s like passing a pointer from one address space to a process in another; you just shouldn’t expect it to work. Either pass around the actual values or do calculations involving those values and replicate the result. For example, consider the following replication requests.

# $var indicates immediate substitution from the original context
# %var indicates a transaction-local variable

# Wrong: sc_rate is passed by reference and interpreted at destination
replicate transaction "transfer #zzz" {
    acct_x -= $amt * (1.0 + sc_rate);
    acct_y += $amt;
}

# Right: sc_rate is interpreted at source and passed by value
replicate transaction "transfer #zzz" {
    %sc_rate = $sc_rate;
    acct_x -= $amt * (1.0 + %sc_rate);
    acct_y += $amt;
}

# Right: service charge is calculated at source
# works, but not good for auditing
amt_with_sc = amt * (1.0 + sc_rate)
replicate transaction "transfer #zzz" {
    acct_x -= $amt_with_sc;
    acct_y += $amt;
}

# Right: service charge as separate transaction
sc = amt * sc_rate;
replicate transaction "transfer #zzz" {
    acct_x -= $amt;
    acct_y += $amt;
}
replicate transaction "service charge for #zzz" {
    acct_x -= $sc;
}

In an ideal world, the interface and behavior for the replication subsystem would disallow or strongly discourage the wrong form. For example, it could require that any values meant to be interpreted or modified at the destination must be explicitly listed or tagged, and reject anything that abuses “extraneous” variables as in the first form above. (Auto-conversion of the first form into the second is likely to introduce its own kinds of unexpected behavior.) That would force people to use one of the methods that actually works.