Canned Platypus

Making the world better, one byte at a time.

Archive for the ‘internet’ Category

I find it fascinating how links get distributed over time. Here’s an example involving the amazing pencil-tip sculptures by Dalton Ghetti,and the times I’ve been presented with the same link in Google Reader.

I predict that it will show up on at least one more website I follow. In maybe a week or so, I’ll see it in print media for the first time. Another week or two after that, I’ll see it in the Boston Globe. That’s the usual pattern, anyway.

I just thought of a new metric: followers per tweet. I’m at about 0.43, which is pretty middle of the road. I see some people who are flirting with the 0.1 mark. At the other end of the scale I see some some who are at 3.0 or better. Not too surprisingly, the first group are disproportionately likely to end up on my “whale-jumpers” list which I check less often, and a similarly disproportionate number of my favorite tweeple seem to be in the second group. I could therefore expect to improve the “quality” of my own personal Twitter stream by checking this ratio for people I’m thinking of following . . . and you could do the same for your own personal stream as well.

BTW, if you want to help me pump up my own ratio, I’m @Obdurodon. ;)

I’ve been on vacation for the last few days, and while I was (mostly) gone a few interesting things seem to have happened here on the blog. The first is that, after a totally unremarkable first week, my article It’s Faster Because It’s C suddenly had a huge surge in popularity. In a single day it has become my most popular post ever, more than 2x its nearest competitor, and it seems to have spawned a couple of interesting threads on Hacker News and Reddit as well. I’m rather amused that the “see, you can use Java for high-performance code” and the “see, you can’t…” camps seem about evenly matched. Some people seem to have missed the point in even more epic fashion, such as by posting totally useless results from trivial “tests” where process startup dominates the result and the C version predictably fares better, but overall the conversations have been interesting and enlightening. One particularly significant point several have made is that a program doesn’t have to be CPU-bound to benefit from being written in C, and that many memory-bound programs have that characteristic as well. I don’t think it changes my main point, because memory-bound programs were only one category where I claimed a switch to C wouldn’t be likely to help. Also, programs that store or cache enough data to be memory-bound will continue to store and cache lots of data in any language. They might hit the memory wall a bit later, but not enough to change the fundamental dynamics of balancing implementation vs. design or human cycles vs. machine cycles. Still, it’s a good point and if I were to write a second version of the article I’d probably change things a bit to reflect this observation.

(Side point about premature optimization: even though this article has been getting more traffic than most bloggers will ever see, my plain-vanilla WordPress installation on budget-oriented GlowHost seems to have handled it just fine. Clearly, any time spent hyper-optimizing the site would have been wasted.)

As gratifying as that traffic burst was, though, I was even more pleased to see that Dan Weinreb also posted his article about the CAP Theorem. This one was much less of a surprise, not only because he cites my own article on the same topic but also because we’d had a pretty lengthy email exchange about it. In fact, one part of that conversation – the observation that the C in ACID and the C in CAP are not the same – had already been repeated a few times and taken on a bit of a life of its own. I highly recommend that people go read Dan’s post, and encourage him to write more. The implications of CAP for system designers are subtle, impossible to grasp from reading only second-hand explanations – most emphatically including mine! – and every contribution to our collective understanding of it is valuable.

That brings us to what ties these two articles together – besides the obvious opportunity for me to brag about all the traffic and linkage I’m getting. (Hey, I admit that I’m proud of that.) The underlying theme is dialog. Had I kept my thoughts on these subjects to myself or discussed them only with my immediate circle of friends/colleagues, or had Dan done so, or had any of the re-posters and commenters anywhere, we all would have missed an opportunity to learn together. It’s the open-source approach to learning – noisy and messy and sometimes seriously counter-productive, to be sure, but ultimately leading to something better than the “old way” of limited communication in smaller circles. Everyone get out there and write about what interests you. You never know what the result might be, and that’s the best part.

(Dedication: to my mother, who did much to teach me about writing and even more about the importance of seeing oneself as a writer.)

I’ve never removed a comment on this blog, even in fairly extreme situations. There are many reasons, including a general dislike of censorship and the notion that once I start policing content I become responsible for that which remains. There are also some purely practical concerns related to the near impossibility of such moderation actually being helpful. As the most active moderator on a forum through nearly a million contentious posts, I learned a few lessons that also apply to blogs.

  • Whipping out the moderator hat with no prior attempts to persuade or warn people only convinces them that you’re more interested in directing than in participating.
  • If moderation is necessary, it’s better to moderate an entire line of discussion instead of trying to make and enforce hard-to-defend distinctions between one comment and another. Everybody – participants and observers alike – has their own idea who initiated a thread’s decline. People’s annoyance at being “caught in the net” is nothing compared to their anger at being singled out.
  • Deleting comments is a bad idea. Often, a comment will contain both a good part and a bad part. At best, deleting both leaves the remainder of the conversation disconnected and nonsensical. At worst, it also tells people that the effort they put into the good part means nothing to you. Close comments, mark bad ones, but don’t delete.

Unfortunately, in the thread elsewhere that inspired my own recent post about standards, Herb Sutter managed to make every one of these mistakes. I’m not saying that he was unjustified in taking action; it’s his blog, he wants traffic, and I for one had stopped visiting that thread because I got tired of being called a “freetard” and such by some of the other participants. What I’m saying is that the action he took was ill considered and poorly executed. For example, I can see that one comment, which was 95% serious commentary with one rude remark, was removed in toto; the “tinfoil hat” insult to which that one remark was a response was allowed to stand. How this fits Herb’s desire for “respectful disagreement” is a mystery, and it’s hard to escape the conclusion that the “moderator” was reacting to what was said rather than how. Whether intentionally or not, he has done less to improve the tone of the discussion than to influence its direction.

That such authoriarian and inconsistent actions were taken in the context of a thread about standards and openness is particularly telling. For too many people, those terms are just marketing hooks and not sincerely held principles. We already have “greenwashing” for the same phenomenon as it applies to ecological concerns. We need a similar term for people who talk the inclusive talk but don’t walk the walk. It’s a shame really, because I think that overall Herb is a good guy who brings great value to the techie blogosphere, but in this particular instance he seems to have taken a stance against responsible blogging.

Do you use Twitter? Do you make sure all your “tweets” are backed up? No? What’s wrong with you? After all, knowing where you went for lunch last Tuesday might be invaluable to future biographers as they document your rise to glory. Not to worry. Now you can use Backupiphish to make sure all of your important online data gets backed up. Just give us all of your social-networking site passwords – you know, the same ones you probably use on your banking sites as well, even though you’ve been told not to do that – and we’ll make sure all of your stuff is nice and safe in our very own Swiss bank accounts. Um, I mean, your data will be on our secure servers in a secure facility protected by the very best guards we can hire for minimum wage, plus our patented 1025-bit encryption. And the best part? It’s completely free! You don’t have to pay us a cent for our useless service. Just your passwords. Don’t forget to send your passwords. You won’t mind if the very first thing we do with your Twitter account info is impersonate you to advertise our service, right? That shows you can trust us. Oh, and using four-letter words makes us seem all edgy and stuff, so make sure you follow #itsour****now on Twitter.

This post has nothing whatsoever to do with Backupify. No sir, not at all. Pure coincidence.

Feb
13
New Tools

Ever since my rant about the sad fate of KScope, I’ve been using vim and cscope and I’m happy to report that the combination is working very well for me. The tutorial is adequate, but I seem to recall a fair amount of web-searching and plain old experimentation before I really felt comfortable with it. The nicest thing is that it works just as well when I’m logged in from home as it does when I’m sitting at the machine where the files reside.

In a semi-related note, I finally got tired of the constant “not quite transparent” updates to Firefox. At least once a week, Firefox updates itself, and can supposedly do it without a restart, but what really happens is that it starts getting more and more flaky until I get so sick of it that I restart anyway. For example, one update caused all dialog boxes to show up in teeny-tiny little windows which were not much more than a truncated title bar and about ten vertical pixels’ worth of content. Other times things just start to hang. I’ve been burned by this at least a dozen times, and finally felt motivated to do something about it. After experimenting with Chrome, I’ve switched to Opera. Built-in bookmark synchronization is one thing that’s nice about it. Also, Opera does lots of things besides browsing, so for example I’m also using it for IRC now so I have one less window lying around. I’d consider using it for email too, thus getting rid of Thunderbird as well as Firefox, but the email client seems to lack a lot of features including LDAP support so I probably won’t. There are also some things I’m used to in Firefox that I might have to find Opera-friendly equivalents for. Chief among these:

FoxyProxy
I use this one a lot, so I might have to start running TinyProxy or similar (preferably something with an easy interface for switching between one rule set and another).
AdBlock Plus
I haven’t noticed any problems with ads or popups, so maybe I won’t need it, but if I do I’ll probably look into auto-converting EasyList into Privoxy scripts or something.
Tree Style Tab
This has become one of my favorites. At least I can put my Opera tab bar on the left, but I had gotten used to TST’s hierarchical nature. Opening a folder full of bookmarks as a tree which can be collapsed or all closed with a single click is very intuitive and useful. Ditto for clicking on all of the interesting links as I’m reading an article, or for opening up a bunch of tabs while surfing eBay or CafePress. I don’t know if there’s much I can do about this one, except decide whether I miss the functionality enough to put up with the Firefox update SNAFU some more.

Necessity is the mother of change, and change is good sometimes. If nothing else, using different tools gives me a bit of that “learning cool new stuff” experience I remember from my early days with computers, and which I don’t get enough of nowadays. It almost doesn’t matter whether I continue using the new tools or switch back to the old ones.

If I mention a public company on this blog, and somebody else posts a link to me on Yahoo! Finance, am I constrained with respect to buying or selling that company’s stock? As absurd as it might seem, I suspect the practical answer is yes. Certainly the person posting to Y!F would be subject to such constraints; there have been many cases regarding that “pump and dump” tactic over the years, and rightly so. I suppose that two unscrupulous people could try to avoid such scrutiny by having the blogger do the trading so that the Y!F poster’s hands appear clean. I’m not even sure it would work, and it would require a far more influential blog than this one to have any effect on price, but it’s certainly not hard to imagine someone trying.

I wasn’t planning to buy or sell anything I’ve mentioned here, but it does irk me a bit that somebody I don’t even know could create that kind of trouble for me. If I ever do start picking individual stocks, I guess I’ll also have to start paying attention to inbound links as well.

Cargo cults are an extreme form of the belief that replicating someone’s behavior will lead to replicating their results. In moderation, this belief seems quite reasonable. After all, imitating the actions of those who do something better than we do is a key part of how we learn. It’s often more efficient or less painful than trial and error. The name, though, comes from technologically unsophisticated Pacific islanders observing US troops in World War 2. The islanders got the idea that by imitating the troops’ behavior they would reap the same rewards in the form of “cargo” – air-dropped supplies which were highly valuable to them. The problem, of course, is that the islanders lacked knowledge not only of the technology that was involved but also of the military context behind much of the behavior they were observing. Thus, and constrained also by their own resources, they ended up imitating only the most superficial aspects of that behavior, with often comical and occasionally tragic results.

Cargo cultism isn’t limited to “primitive” people, though. In fact, it’s very common among programmers. Many people think that if they use the same operating system or programming language or text editor as Joe Rockstar, then they will achieve rock-star results themselves. If I do this thing that I don’t really understand, my code will be as secure as Joe’s. If I repeat that mantra in my code, it will be as scalable as Joe’s. There’s a heated discussion on the cloud-computing list at Google Groups, about “NoSQL” (more correctly “non-transactional”) data stores vs. traditional ACID-compliant RDBMSes, and the term “cargo cult” has been applied to both sides – correctly, I think. Many NoSQL advocates do indeed seem like cargo cultists. Their code will never see a scale where a free, well understood and well supported traditional database wouldn’t suffice, but they think that imitating the approaches of the highest-scale sites will bring them great success . . . somehow. There are legitimate problems with this magical belief. Unfortunately, the most strident dissent often comes from the cargo cultists of Web 1.0 and earlier, who subscribe to an equally magical belief that putting all of your data into a transactional RDBMS will solve all of your problems. It’s a battle for supremacy between two cargo cults, not a repudiation of cargo cultism itself. This pattern is actually quite common, unfortunately, usually because the participants on both sides have failed to ask two crucial questions:

  • Were the people I’m imitating actually successful in some relevant way? If somebody’s claim to fame rests on being at six dot-coms (plus a blog and hyper-active participation on mailing lists) then I’d suggest not imitating them. Even if they succeeded in selling a dot-com for big money, that might only indicate business – not technical – ability. Don’t copy code that sank into oblivion after its authors became millionaires; it probably sank for a reason.
  • Is the behavior I’m imitating essential to success, or merely incidental? Distinguishing the essential from the superficial requires drawing the lines from particular behaviors to particular results, which in turn requires understanding the technical context. Imitation can provide shortcuts in implementation, not in understanding; you still need to do the hard work of understanding the technology involved.

It’s easy to fall into the trap of imitating the unsuccessful all too well or imitating the successful quite poorly. Even smart people often end up waiting their whole lives for that cargo to arrive. Distinguishing success from notoriety and substance from style is often harder than mastering the specific skills needed to solve a problem. However, those who learn to ask these questions and imitate only the essential behaviors of the truly successful stand a good chance of succeeding themselves.

Thanks to Isis for bringing these very talented folks to my attention. (Warning: lots of not-so-family-friendly language.) I didn’t particularly care for the song she linked to, but – as somebody who has enjoyed Uncle Bonsai and similar acts for years – it was enough to check out a few others. Sex With Ducks is a brilliant satire of anti-gay-marriage slippery-slope arguments, dedicated to Pat Robertson. So far, though, my favorite is Self Esteem.

My self-esteem’s not low enough to date you.
It’s close, but not quite there.

Ouch.

The hot topic in the blogosphere right now is about BitTorrent Inc. switching to a UDP-based protocol for their popular uTorrent client. Most of the links on this subject ultimately lead back to Bittorrent declares war on VoIP, gamers at El Reg. Having talked to Richard Bennett before, I’m well aware of his penchant for saying outrageous things to get attention, but most of his critics are behaving even worse. For example, here’s Janko Roettgers.

Bennet’s piece is based on a belief that UDP traffic is “aggressive” and uncontrollable, whereas TCP is the nice and proper protocol that can be easily managed. This notion ignores the basic fact that P2P developers, in order to make the protocol work at all, need to implement TCP-like functionalities on top of UDP, one of which includes congestion control. You simply can’t operate a P2P client that eats up all of its users’ bandwidth, much less build a successful business model on top of it.

That’s an unfortunately, not to say selfishly, client-centric perspective. For one thing, eating up all of the user’s own bandwidth is not the issue; crowding out other users’ traffic is. For another, Roettgers completely ignores what happens to packets in between two PCs on the internet. The fact is that all those routers have mechanisms in place to do congestion control, and – while there are proposals for TCP-friendly rate control out there – many people have traditionally turned to UDP for the specific purpose of bypassing congestion control. As George Ou pointed out a while ago, P2P applications tend to “game the system” in effect and probably by intent. I know from my own conversations with Bram Cohen that he has never liked TCP, so it doesn’t really seem all that unlikely that the UDP switch is very intentionally to avoid congestion control – despite the inevitable “collateral damage” to non-BitTorrent users, and therefore quite antisocially. Bennett offers one further reason to believe this.

Upset about Bell Canada’s system for allocating bandwidth fairly among internet users, the developers of the uTorrent P2P application have decided to make the UDP protocol the default transport protocol for file transfers.

I don’t know where Bennett got the Bell Canada angle, and considering the source I’d be interested in any pointers to relevant statements by Cohen or anyone else at BitTorrent, but it certainly seems plausible. What I’d be most interested in, though, is any credible alternative explanation. “It will make BitTorrent downloads faster without affecting anyone else” is just another way of saying there is such a thing as a free lunch. The claims about BitTorrent’s uTP doing congestion control better than TCP are suspect too, since details on uTP don’t seem to be available for us to review and the one detail I’ve seen mentioned – using latency to detect congestion – doesn’t exactly represent the current state of knowledge among people who really understand congestion control. Can anyone explain, without hand-waves and exaggeration, why uTorrent is switching to UDP other than to consume a greater share of bandwidth than existing mechanisms to preserve robustness and fairness in the internet would allow?