Canned Platypus

Making the world better, one byte at a time.

Archive for the ‘uncategorized’ Category

About eight years ago, I wrote a series of posts about server design, which I then combined into one post. That was also a time when debates were raging about multi-threaded vs. event-based programming models, about the benefits and drawbacks of TCP, etc. For a long time, my posts on those subjects constituted my main claim to fame in the tech-blogging community, until more recent posts on startup failures and CAP theorem and language wars started reaching an even broader audience, and that server-design article was the centerpiece of that set. Now some of those old debates have been revived, and Matt Welsh has written a SEDA retrospective, so maybe it’s a good time for me to follow suit to see what I and the rest of the community have learned since then.

Before I start talking about the Four Horsemen of Poor Performance, it’s worth establishing a bit of context. Processors have actually not gotten a lot faster in terms of raw clock speed since 2002 – Intel was introducing a 2.8GHz Pentium 4 then – but they’ve all gone multi-core with bigger caches and faster buses and such. Memory and disk sizes have gotten much bigger; speeds have increased less, but still significantly. Gigabit Ethernet was at the same stage back then that 10GbE is at today. Java has gone from being the cool new kid on the block to being the grumpy old man the new cool kids make fun of, with nary a moment spent in between. Virtualization and cloud have become commonplace. Technologies like map/reduce and NoSQL have offered new solutions to data problems, and created new needs as well. All of the tradeoffs have changed, and of course we’ve learned a bit as well. Has any of that changed how the Four Horsemen ride?
Read the rest of this entry »

And now for something completely different…

I got this camera a few months ago after I’d lost its predecessor, and I think it has become my favorite out of the half-dozen or so digital cameras I’ve had over the years. You can find specs etc. anywhere, but here are some of the highlights from my perspective.

  • It’s small, easy to use, and takes video as well as stills, so it’s very convenient to bring everywhere.
  • Battery life seems excellent.
  • Start-up and between-picture times are better than average. This was one of my main selection criteria.
  • It doesn’t have any of the focus, exposure, or color-balance problems that I’ve seen in other cameras (especially its immediate predecessor).
  • When I zoom in, the pictures seem remarkably noise-free, so they resize well (using iPhoto – Facebook’s auto-resize is amazingly bad so don’t judge by that) and I can often skip a clean-up stage.

That’s really it. I’m no expert, just a casual family photographer, but that’s a description that fits many people so maybe someone will find my positive experience useful.

One of the dangers of making something easier to do is that a lot of less skilled people will start doing it. One familiar example of this is writing multi-threaded code. All of a sudden everyone’s doing it, the vast majority without any understanding of the principles behind writing good multi-threaded code, so an awful lot of them make a complete hash of it. The same is beginning to be true of distributed code. The example that has been on my mind lately, though, is filesystems. FUSE has made it a lot easier to write filesystems, so a lot more people are doing it. I generally consider that a good thing, and (unlike many of my kernel-filesystem-developer colleagues) I’m not going to look down my nose at FUSE filesystems just because they’re FUSE. After all, I just finished writing CassFS in my spare time. On the one hand, it illustrates just how easy it can be to slap a basic filesystem interface on top of something else. It took me about twenty hours’ worth of spare time, and I’m not Zed Shaw so I’ll give credit to FUSE instead of pretending it’s proof of my own awesomeness. On the other hand, CassFS is also an example of how badly a FUSE filesystem can suck. I won’t go into details here, since I already did, but my point is that CassFS is no worse than a bunch of other FUSE filesystems out there and some of those projects’ authors still act like their little brain-fart is equal to the more mature efforts out there. That does bug me. It’s great that technologies like FUSE allow people to do something that would previously have been out of reach for them. It’s not so great that the people who’ve been working on the truly hard problems in this area for ten years or more, and who might expect credit or even profit for those efforts, have to “share the stage” with people who just got basic read/write of a single file by a single process working.

That brings me to the real topic of this article. There are a lot of parallel/distributed filesystems and other data stores out there nowadays. Some of their authors are making pretty grandiose claims because their pet does exactly one thing well and when they tested that one thing vs. better-known alternatives it didn’t do too badly. Well, sorry, but that doesn’t cut it. It’s like “racing” the guy in the car next to you who doesn’t even know you’re there because he’s busy doing what he should be doing which is paying attention to conditions up ahead. If you want your p/d filesystem to be taken seriously, you have to meet at least the following criteria.

  1. Support practically all of the standard filesystem entry points with reasonable behavior – not just read/write but link/symlink operations, chown/chmod, rename, stat returning reasonable info, etc.
  2. Have distributed metadata, not a single metadata-server SPOF/bottleneck.
  3. Provide intra-file striping for high performance access to a single file from one or many nodes/processes (the latter precluding whole-file locks) and for even data distribution across servers.
  4. Support RDMA-style as well as socket-style interconnects, also for high performance.

I’m aware of only three open-source alternatives that meet this standard, and dozens that don’t. Lustre failed criterion 2 when I worked on it, but claims to have gotten past that and I’ll give them the benefit of the doubt. PVFS2 also passes; some might quibble about whether their explicit rejection of certain obscure POSIX requirements allows them to meet criterion 1, but I think they’re close enough. GlusterFS also passes, though there’s some room for improvement on criterion 4. Of the rest, I suspect NFS4/pNFS advocates are the most likely to show up and object, but I don’t think NFS4/pNFS are even in the right space. They’re protocols, not implementations, and the existing open-source implementations don’t even address how to use the protocol features that were put in for this sort of thing. As far as I know, most if not all multi-server NFS4/pNFS implementations have used some other parallel filesystem on the back end to handle that, and it’s those other parallel filesystems (PVFS2 in one case but more often proprietary) that I’d consider.

If what you want is a real, mature parallel filesystem to deploy today, these are the ones you should look at. In another year or two, maybe some other very exciting and promising projects will join the list. Ceph is my favorite candidate, along with POHMELFS and HAMMER. Such things are great to play with, but I don’t think I’ll be putting my home directory on one. Come to think of it, I never got around to putting my home directory on any of the Big Three either. Maybe once I’m done with my current subproject I’ll take a big bite of my own dogfood.

You won’t learn anything about cloud application architectures from this book. Author George Reese never even mentions (deep breath) memcached or other key/value stores, non-SQL/non-ACID databases, map/reduce, SOA, messaging services, ORMs, Terracotta/EhCache (or anything else in the Java or Ruby ecosystems), cloud storage, etc. He devotes a whole two sentences to sharding – one to mischaracterize it as a fault-containment (not performance) technique, and another to dismiss it out of hand. He discusses the problem of lock-based concurrency not scaling, but never mentions alternatives such as actor model or STM; his “solution” is to stick everything in MySQL and let it manage concurrency. These are all startling voids in what an actual cloud application architect – let alone an author on the subject – should know.

So, what will you find in the book? There’s some pretty decent information about security, capacity planning, and disaster recovery. None of that is very specific to cloud, of course, and it tends to be operational rather than architectural knowledge, but he does a decent job. There’s also some good information about running applications on EC2, including deployment and management of your own AMIs. You’ll also find some strong but unsubstantiated (and mostly inaccurate) claims about Amazon EC2 reliability, virtualized I/O performance, and other subjects.

Lastly, you’ll find ads. The appendices written by Rackspace and GoGrid, and attributed as such, I find minimally useful but not particularly annoying. What really annoyed me was the blatant ad copy for Cleversafe, inserted without attribution into the text. There’s little evidence elsewhere in the text that Reese ever used Cleversafe or understands its technical underpinnings. Its essential nature is quite contrary to the DB-centric cargo cultism evident elsewhere in the book (which inspired my post on the subject). It’s totally out of place and inappropriate.

Perhaps I’d be less critical (except for the advertising part) if the title said Cloud Application Deployment. If you’re in IT and want to know how to take an old-school DB-centric low-scale application and deploy it in the cloud, you’ll find the book quite useful. If you’re architect and want to know how to develop a high-scale cloud application, it might be worse than useless. It might actually misinform and mislead you, leading to an application that doesn’t work.

UPDATE: I still haven’t found any good books on this subject, but at least Amazon has a couple of good articles.

As mentioned in my previous post, I’m at Cindy’s college reunion for the next few days. This means I’ll be busy most of the time. Even when I’m not, it looks like Colgate’s network is pretty restrictive about anything other than basic browsing for guest users. I can’t be bothered circumventing that, so I can only check email if I switch over to the celphone – which I won’t be doing much. I know a lot of people are trying to contact me right now; I’ll get back to all of you on Monday.

I might learn to knit just so I can make this.

For months, I’ve been getting calls on my celphone about extending the warranty on my Subaru. The one I sold and don’t have any more. These calls were the main reason I’d all but stopped answering calls from unrecognized numbers (particularly unrecognized area codes) because they accounted for a majority of such calls. Now it seems that I’m not alone.

Earlier this year, I was receiving daily calls to my cell phone from an automated voice offering an extended auto warranty.

I’m an AT&T customer, but these car warranty auto-calls have apparently also been plaguing Verizon Wireless customers.

The companies are accused of using an autodialer to contact Verizon Wireless customers, and of masking the origin of their calls. Since January 2008, more than 2 million customers have received such calls.

Pre-recorded voice messages say that a recipient’s car warranty is about to expire, and tells them to press “1″ for more information. If the call recipient does, he or she is connected to a person who asks for the make and model of the car. If the recipient asks for information about the company, the operator hangs up.

As the story explains, such behavior is not only annoying but out-and-out illegal, and a $50K fine doesn’t seem anywhere near as big as it should be. The people behind these operations should go to jail, not only for the direct annoyance and loss of time they’ve caused but as an example and discouragement to others. I know no such thing will ever happen, but at least now maybe I can start answering my phone again.

Apr
20
Feeling Old

Besides being Patriots’ Day, this is also the thirtieth anniversary of my arrival (back) in the United States. Thirty years. Wow. I can’t quite get used to the idea that I’m old enough for an entire era of my life, full of memories, to have ended thirty years ago. My time spent in New Zealand is already far less than my time spent in Massachusetts. In a few years it will be less time than I’ve spent in Lexington, and less than a quarter of my life overall. It’s sad to think that my entire childhood is now so long and far away.

I guess the piracy situation is on everyone’s minds now, and I hear that we’re even considering action against pirate bases on land. It doesn’t sound like all that bad an idea, actually, assuming it’s a limited action in concert with the Somali government and strongly associated with humanitarian efforts to alleviate the conditions in Somalia that have led to piracy. I do wonder about other actions we might take, though. Clearly escorts – US or other – haven’t worked as well as we might have hoped. Do we need more escorts? Alternatively, what about putting heavily armed troops on the ships as they travel through the area? Besides potentially being able to repel boarders, such an approach would put pirates on notice that they’d be attacking the US no matter which nation’s flag the ship flies under, and that might make them reconsider. Yes, that’s a lot of ships and a lot of troops, but I’d be interested in how it compares to the logistics involved in a land action. Or maybe it’s a stunningly bad idea for other reasons. What do others think?

I was just talking to a coworker about this, and realized it was worth a post. Better yet, it’s worth a picture.

I am bleeping sick of bright, flashing blue LEDs on just about every piece of electronic gear. One of the worst offenders is a Netgear WNR854T wireless router/AP that I bought recently. One of the activity lights is blue. It flashes a lot. It flashes even when there’s no reason to believe that there’s really any of the traffic that it supposedly indicates. It’s so bright that it casts a shadow all the way down the hall in the middle of the night – a flashing shadow, while I’m trying to go downstairs for a drink without turning on the hall light. Of course, if I look for even one moment at the light that’s casting the shadow – as everybody’s brain is hard-wired to do – then I’m blinded because blue is the worst color for destroying night vision. It took me all of one night to tape cardboard over that, but then there’s the (fortunately less bright) blue LED on the NAS box next to it, and the two blue LEDs on the computer next to that (which is usually off), and way too many others.

Manufacturers: stop putting bright blue LEDs on every darn thing, especially ones that flash. Tone them down (resistors are cheap) and/or use more reasonable colors. Blue LEDs were cool half a decade ago, but now they’re just annoying.