RPC and Protocol Design

Every so often, people notice that I grimace when RPC is mentioned. I’ll try to explain why. Firstly, let’s get it out of the way that “RPC” can mean many things. Sometimes when people say “RPC” they mean it in a very generic sense, to refer to any facility for calling functions remotely. Other times they very specifically mean Sun RPC version X or DCE RPC version Y or CORBA version Z. I’m going to use “RPC” more in the generic sense, but with a recognition that some particular comments might not apply to some particular kinds of RPC.

So…what’s wrong with RPC? It’s a very useful abstraction, isn’t it? Well, yes it is, but every abstraction has limits. RPC is to a certain extent a victim of its own success. Because it provides such a useful abstraction for such a common and important class of remote interactions, people forget that there are other kinds of remote interactions for which the RPC abstraction is inadequate. For example:

  • RPC involves both a request and a response, just like a function is invoked and then returns. But what about messages for which no response is necessary? Many applications can make very effective use of various “hints” that can safely be dropped and therefore need not even be acknowledged, and no state need be maintained on the sender regarding their status. The RPC model forces every request to be tracked and responded to individually, making such “hinting” significantly more expensive than it needs to be.
  • Similarly, the RPC model incorporates a “round trip” assumption that can be too limiting. What if the logical message path for an operation is from A to B (who A thinks has a resource) to C (who B knows really has it) and back to A? This pattern appears frequently in distributed applications, but in most kinds of RPC A will be unprepared to get a reply back from C instead of B. In some cases the desired effect can be obtained by issuing a fake multicast RPC, but that’s kludgy at best.
  • In addition to the “response from the wrong place” problem above, most forms of RPC are not very “proxy-friendly”. There are many reasons, but by far the most common is reliance on per-request timeouts. These are typically set by the requestor without even knowing whether proxies are present, and start to occur spuriously as proxy hierarchies deepen.

For all these reasons, and probably more that I can’t think of or articulate right now, I’ve learned to distrust RPC. Don’t get me wrong; if your needs happen to match the assumptions emobodied in the RPC model, you’d be crazy not to take advantage of the facilities that exist, and even if you’re not doing true RPC the data serialization/deserialization (a.k.a. marshaling) tools that are often associated with RPC packages are very useful. However, the choice to use RPC is one that should be made with a full awareness of its limitations.

The obvious next question is what to do when RPC is not appropriate. That’s a complicated topic; once you move beyond RPC you’re in the realm of protocol design, about which whole books have been written. The difficulty of designing a correct, robust and efficient protocol is in itself one of the best reasons to stick with RPC. The #1 rule of protocol design is, of course, not to do it if an existing protocol – RPC-based or otherwise – is at all adequate. Nonetheless, it is sometimes necessary and I have some suggestions for when it is:

  • Avoid timeouts as much as possible. If you need to use elapsed time to determine that a node or connection is dead, isolate that logic in a heartbeat protocol that provides “node/connection died” events. Have your main protocol retry requests indefinitely until success or until one of these events occurs. In most other situations there are better alternatives to using timeouts at all.
  • Keep state transitions to a minimum. I don’t believe “stateless” protocols suffice in all situations, and they’re certainly not always the most efficient alternative, but huge state tables are generally the hallmark of poorly designed protocols. Keeping the state space small helps both in proving the protocol correct and in implementing it later.
  • “Future-proof” your protocol by using protocol version numbers, self-identifying message fields, capabilities negotiation, etc. I might write another whole article just on techniques in this area some day.
  • Validate your protocol. There are many tools available for this, and using any of them is far better than having to debug every race condition or deadlock/livelock “in the wild”. I’ve written my own protocol validator, but that was more as an exercise than anything else and for serious use I’d recommend Murφ instead.
  • When implementing your protocol, be extra-aggressive about adding checks for invalid state transitions (often identified as such during the validation process), so implementation errors can be caught sooner rather than later. This sort of checking is a good idea for any code, but it’s absolutely indispensable for protocol code.
  • Add lots of logging, at multiple levels – the messaging level, the “protocol engine” level, the higher-level request level. Leave the logging facilities in production code.

There’s much more, but that should be a good start. I’ve designed successful protocols for everything from cluster membership to distributed lock management to cache coherency over the years, and if it weren’t for these rules I would have gone insane long ago. I hope that, by writing them down, I can help someone else learn from my mistakes instead of having to repeat them.

Posting Email Without Permission

Probably the most controversial part of my last post was about posting email. I know quite a few otherwise right-thinking people who have nonetheless adopted a “posting email is OK” attitude, so I’ll try to explain my own reasoning for why it’s not OK.

Basically it all comes down, for me, to a simple question: should posting email be an opt-in or an opt-out process? Put another way, should the person who sends email have to grant permission explicitly, or deny it explicitly? It seems like a simple enough question, but there’s a little bit of a trap there for people who believe it’s OK to think opt-out is OK. Specifically, consider this: spammers have exactly the same attitude, the only difference being that they’re thinking about the message envelope rather than its contents. I shouldn’t need to explain to anyone why spam is bad; I could, but that’s a topic best left for another article.

If spammers like opt-out for the message envelope, and you believe opt-out is OK for the message contents but don’t want to be considered morally equivalent to a spammer, you have only one choice: come up with a reason why the contents of a message deserve less protection than the envelope. Good luck. Absent such a reason, people who make a habit of posting email without permission are basing their actions more on lazy self-interest than on principle. Put another way, they really are morally equivalent to spammers.

Guidelines for Responsible Blogging

Recently I’ve had cause to think about abusive blogging. As a result, I’ve come up with a few simple rules that people can follow to avoid seeming like total jerks.

  1. Respect others’ privacy. Don’t post information or narrative about other identifiable people without prior permission.
  2. Be particularly careful when posting about people who don’t have their own weblogs. Such people have no forum to correct errors or present their own perspective other than through you; you owe it to them not to misrepresent things that happened.
  3. Don’t post email without permission. Even though this rule has been well known for over twenty years, the frequency with which it is violated by the ever-increasing tide of newbies has led some people to believe it’s a relic that no longer applies to the modern Internet. Those people are wrong. Email is still private communication, and should be treated as such. Offering to remove it after the fact is no substitute for asking for permission (which is often granted, in my experience) before.
  4. Even if you feel justified in ignoring any of the above rules, at least notify the person affected that you have done so. Maybe you can skip this if you’re absolutely sure that they read your weblog frequently, but even then notification can’t hurt. People that you write about deserve an opportunity to see what you wrote, determine how they feel about it, and possibly respond. “Stealth attacks” where you try to get in the last word on a subject by the simple expedient of not letting the other person know a response might be called for are a slimy tactic.
  5. If somebody objects to your discourtesy, at least try to consider the validity of their objection before treating it as an opportunity to toot your own horn some more, counterattack, etc.

I admit that I haven’t always been good about following these rules myself. As I said, I’ve recently had the issue of online-writing etiquette brought to my attention – by an individual whose tendency to break every one of these rules on a constant basis betrays a level of self-absorption and contempt for other’s sensibilities that should preclude him from participating in polite society. Having had my eyes opened in this way, though, and not wishing to seem in any way similar to that person, I’ll try to be a little bit more careful about what I write in the future.

Occupational Hazards of Writing Online

Sometimes it’s nice to know people are reading stuff I write here. Other times it’s not so nice. Supposedly in response to my previous article about multithreading, David McCusker wrote this:

He seems to think I don’t already plan hybrid solutions. Why is that? But I guess it’s easy and simple to assume everyone else is stupid.

Well, David, I apologize. No offense was intended. I didn’t mean to imply anything about your own work, and certainly not that you’re stupid; until now I hadn’t realized such an interpretation was even likely. I thought my jab at Paul Graham was pretty clearly a way of poking fun at such attitudes, but apparently I was mistaken. In the future, I’ll try to show more respect for others’ work, instead of assuming that everyone else who has ever worked on networking or data storage got it wrong and my project will reveal the One True Way. Such hubris is truly obnoxious, and I thank you for the warning.

Fast Food Economics

I recently read Fast Food Nation, and found it quite interesting on several levels. The book makes a lot of obvious points about overmarketing, health risks, worker safety in meatpacking plants, etc. One (slightly) less obvious point that came across very strongly to me was what the book revealed about the dangers of laissez-faire capitalism. We see example after example of how the big companies like IBP, ConAgra and ADM thrive on government subsidies and purchasing programs. Time after time, they practically buy laws that raise barriers to entry or otherwise adversely affect smaller competitors. And yet, every time there’s any hint of any regulation to address the dangers these companies pose to their workers, to consumers, or to the environment, all of a sudden they act like the free market will cure all ills and they’re its defenders. The level of hypocrisy they exhibit on this point is incredible, even to one relatively jaded about such things.

I guess some people would say I’m not a capitalist, and if we’re to make useful distinctions among such beliefs I think I’d even agree. I don’t believe in capital as the central concept in a successful economy. I believe in competition, which might make me a “competitionist”. I think competition is what drives people to innovate, to work harder, to be more responsive to customers (and other stakeholders). I also think that maintaining a competitive environment requires the presence of an impartial referee. Sorry, but the players can’t be trusted to run the game themselves. To the extent that the government acts as a player in the market (except as necessary e.g. to support the military) or as a biased referee (see above) I think that’s bad government. However, I have absolutely no objection to the government setting and strictly enforcing rules that benefit society – which is most often best served by maintaining a market environment in which new entrants can compete on the basis of merit instead of being stifled by the government or stomped on by entrenched trusts and cartels. Yes, it’s natural that companies should seek advantage through mergers and cooperation. It’s also natural that food rots. That doesn’t mean we should just accept the results instead of intervening to create a better overall outcome.

Tahoe Pictures

I finally got the Tahoe pictures rotated, resized, arranged, etc. etc. Phew! I ended up using 38 out of over 100, and even the thumbnails total about 1.3MB, so turn image loading off if you’re on a slow connection. As usual, clicking on a thumbnail will get you an 800×600 version (100-300KB), and you can send me email if you want any of the 1600×1200 originals.

Multithreading vs. Message Passing (part

I guess I might as well jump into the multithreading vs. message passing debate that has been making the rounds. The current meme seems to be that multithreading done improperly can do major damage to a program and its author’s sanity. Should I do a Paul Graham impression and say that people who can’t do multithreading properly are just too stupid to even have computers? Maybe some other time.

First, I’d like to point out that many problems – e.g. race conditions, deadlock/livelock, are held in common between the two approaches, even though they might occur in superficially different ways. Other problems are clearly tradeoffs with no “right” answer. For example, many multithreaded systems exhibit poor performance due to excessive context switching, and message-passing advocates point out that their systems don’t have that problem. Well…

  • The excessive context switches are an implementation artifact, avoidable while retaining a multithreaded paradigm (I’ve done it on more than one occasion). They’re not an inherent problem with multithreading.
  • Message passing systems are only immune to the context-switch problem if they run everything in a single thread, which is to say that they’ve traded away multiprocessor scalability for simplicity and single-thread performance. Message passing systems that use multiple threads – yes, even staged systems like SEDA – are just as prone to excessive context switches as any multithreaded program ever was.

In short, I consider that particular form of advocacy, most often engaged in by the message-passing folks, intellectually dishonest. If a problem is not inherently worse, or less readily solvable, in one system than the other, then it is of no value in differentiating between them.

Secondly, I’d like to point out that I’ve written on this topic before. Here’s a scorecard from almost a year ago, more discussion, and some micro-benchmark results. Mostly these focus on single-thread message passing, but the results really don’t change when SEDA-like approaches are considered.

Lastly, my current thoughts. I’m dealing with these very issues in my current project, and the result has been (surprise!) a hybrid between the two approaches. The bulk of the code is decomposed into stages and looks very much like message passing, but the way that requests pass through stages is more like subroutine calls in a multithreaded system than the sort of dispatcher/scheduler that a pure message passing system would use. In addition, there are several maintenance-related tasks (e.g. a dirty-block flusher, a cache pruner) that iterate over private lists and then inject messages into the staged part of the system. So far, it’s working extremely well for me.

I guess the lesson here is: don’t be an extremist. A hybrid approach can solve the worst problems associated with either “pure” approach, but allowing either one to become dogma can be deadly to your real-world effectiveness as a programmer.

One More Mammal

I forgot one other animal we saw at the last lake on Saturday – a bat. At noon. Skimming over the lake. This is, of course, not the sort of behavior that immediately makes one think of bats, so it took us quite a while to realize that’s what it was, but when it got fairly close to where we were sitting the conclusion was pretty inescapable. The leathery wings and lack of a bill really kind of put the bird hypothesis to rest.

Tahoe Report (non-technical)

Still no pictures – soon, I promise – but here, in roughly chronological order, is the tale of last week’s trip.

Last Saturday (June 8) we flew out. We were originally scheduled to fly into Reno via Phoenix, but the Boston to Phoenix flight was cancelled. Apparently there was neither a plane nor a crew to fly one available, due to weather problems elsewhere in the country the day before, and America West had called both of our daytime phone numbers to inform us – at close to midnight. Grrr. Needless to say, neither of us got the voice mail, so the only alternative by the time we showed up was to get to Phoenix via Columbus, Ohio. That’s right: Boston to Columbus to Phoenix to Reno, with a long layover in Columbus. By the time we got to Reno, rented a car, and drove from there to Tahoe City, we were pretty tired, but we made it.

On Sunday, we did a short hike to Five Lakes, near Alpine Meadows. There was actually snow on the ground and on the bushes when we started, but it had all burned off by the time we headed back; when I get around to posting pictures, I’ll include before and after pictures of Cindy in the exact same spot to show the contrast.

Monday through Wednesday I was at the OceanStore/ROC retreat while Cindy went hiking. You can see lots more about the retreat in my previous entry, and pictures from Cindy’s trips will be posted soon. On Wednesday afternoon we went around the eastern short of the Lake Tahoe as far as Spooner Lake, then came back to town for some stuff, and then went around the western shore to South Lake Tahoe.

On Thursday we did our big hike, to the summit of Mt. Tallac via Gilmore Lake. Gilmore Lake was quite beautiful, and marked our first encounter with what we later discovered to be a Mountain Chickadee. Initially we called it the Turd-Eating Chickadee, for reasons that I don’t think I need to elaborate. The trail from the lake up to the summit was complicated by false cairns and some sections of the trail being obscured by snow, and then there was quite an arduous rock scramble at the end, but it was well worth it. The view from the top of Mt. Tallac is absolutely spectacular in almost all directions.

On Friday we took it relatively easy. I was glad for the rest, not so much because I was tired but because I’d picked up a slight pinkish tinge the day before. We went to the Forest Service visitor center, then on a boat where we saw Mt. Tallac from the lake side and boggled that we’d actually been up there. We wound up the day with some quiet strolling and an excellent meal at Camp Richardson.

Saturday was also relatively unexceptional. We hiked around Echo Lake to three other lakes – Tamarack, Ralston, and something else I don’t remember, then took the Echo Lake water shuttle back. Nothing quite as dramatic as some of the other stuff we’d seen, but still excellent hiking.

Here’s a partial list of birds and animals we saw, that we don’t get to see back home (or at least not often):

  • Steller’s Jay (everywhere)
  • Brewer’s and Red-Winged Blackbird (many places)
  • Dark-Eyed Junco (first at Five Lakes, also other places)
  • Western Tanager (first at Five Lakes, also other places)
  • Mountain Chickadee (first noticed at Gilmore Lake, also other places)
  • Osprey (Spooner Lake, Emerald Bay, Mt. Tallac)
  • White-Headed Woodpecker (near Echo Lake)
  • some as-yet-unidentified kind of merganser (visitor center)
  • some kind of diving bird (visitor center)
  • some small yellow-green bird that moved too fast to see details(various places)
  • miscellaneous swallows (esp. in Tahoe City and at Camp Richardson)
  • all sorts of indistinguishable squirrels, chipmunks, and pika (many places)
  • marmot (near Ralston Lake)

We picked up several interesting items, almost but not quite making up for the loss of Little P:

  • a lead-crystal block containing a “bubble sculpture” of a deer (very interesting process to create these, and I’ll probably explain it when I have a picture)
  • a cute quarrystone frog that now sits on my monitor at work
  • a genuine Zuni onyx sculpture of a platypus (yes, I know there are no platypi where the Zuni live; apparently they don’t know that, though, so cope)

All in all, it was a most excellent (and productive) trip. If you ever get a chance to go, especially in the short interval between snow and mosquitos (like we did), I highly recommend it.

Tahoe Report (technical)

The retreat I attended was, as usual, a combined retreat for two projects: OceanStore and ROC (Recovery-Oriented Computing). Also as usual, there was a second retreat going on at the same place and time, for SAHARA, with a few joint sessions etc. I’ve never quite gotten a handle on exactly what the SAHARA folks are trying to achieve, but it’s basically about a low-level network infrastructure that’s shared by multiple providers (especially of the celphone variety) instead of being controlled by one.

One major bummer at the retreat was that neither David Patterson (he of RISC, RAID, and the highly-regarded computer architecture books) nor Dennis Geels (one of the OceanStore “core crew”) was able to attend, in both cases due to illness. In addition to being deprived of their fine company, this caused several minor rearrangements of the schedule around the talks that they had been scheduled to give. There was still lots of interesting stuff, though, most of which has been very well summarized here. What follows are some of my own personal, idiosyncratic thoughts, beyond what was captured in the official visitor feedback pages (thanks to George C and Aaron B for capturing those).

  • A Utility-Centered Approach to Designing Dependable Internet Services (George Candea)

    This work seems, not belittle it in any way, to be in a pretty early stage, but its goals fit well with my own preference for making decisions on a quantitative “rational” basis instead of just by the seat of one’s pants. In a nutshell, the idea is to define utility functions based on several orthogonal metrics – e.g. data availability, performance and cost – and use those metrics to create a multi-dimensional design space. Design alternatives at successively finer levels of detail can then be selected by plotting each within the design space and selecting the one with the greatest overall utility according to some formula (e.g. greatest Euclidean distance from the zero-utility origin).

    My own idea about this was that merely assigning weights to the different axes, or combining them using different formulae, would be insufficient due to the issue of design or implementation risk (or quality). It’s highly unlikely that a design alternative would be adequately represented by a single point along any axis; more often, especiallly with finite schedules and resources, the proper representation would be a probability function. When the axes are combined, they therefore produce a field within the design space rather than a single point. The fields representing alternatives might vary in density or even be discontinuous; they might even overlap with one another. What this approach makes possible, though, is to compare alternatives based on a selectable certainty level. Even more importantly, it allows the decision to be remade whenever the probabilities or schedules or comfort levels change, without having to redo every part of the computation.

  • Tapestry (various)

    This wasn’t exactly related to any particular talk, but the issue of the relationship between topological closeness within Tapestry and geographic or IP-network closeness seemed to keep coming up. In particular, Brocade seems to be an attempt to reconcile the two. Similar issues arise not only in Tapestry, but also in similar Plaxton/Chord/Kademlia/CAN networks. If I were a grad student myself, I’d be seriously thinking about a thesis on how to construct such networks so that the potential for “optimal at one level, terrible at the next” routing was minimized.

  • Simultaneous Insertions in Tapestry (Kris Hildrum)

    First off, I’d like to point out that this presentation had to be put together in a hurry to fill a schedule gap resulting from one of the aforementioned absences, and insertion/membership algorithms are hellishly difficult to explain even with all the time in the world. I’m sure a more followable explanation will be forthcoming soon.

    The important thing about this work is, really, that it’s being done successfully at all. The OceanStore group seems very acutely aware of the problems that high “churn” rates in a network can bring, and they’re making significant progress on protocols that can survive and yield correct results even in such an environment.

  • The OceanStore Write Path: A Quantitative Analysis (Sean Rhea)

    There’s a lot of interesting stuff here, but there’s one particular observation that I think deserves special mention: hashing isn’t free. Calculating something like an SHA-1 hash has all the system effects of a copy (which we all know by now to avoid) plus computational overhead, and hashing speed can actually become the main bottleneck in this type of system. Yeah, I know, to some it might seem obvious, but I see many others falling into the trap nonetheless. It’s worth the effort to manage hash information so it doesn’t need to be regerated, to consider weaker (but cheaper) hashes, or to adopt algorithms that don’t depend on such hashes.

  • Towards Building the OceanStore Web Cache (Patrick Eaton)

    As I pointed out in my feedback, the historical record of adoption for distributed filesystems is pretty bad, and systems to handle email attachments haven’t fared much better. IMO, the OceanStore web cache is by far the most likely of the applications developed so far to actually benefit real people in the real world (if that’s a goal). Making that happen might involve some not-so-sexy hacking to deal with crappy web consistency models, but I think it’s a very important direction and I hope this work continues.

  • 100-Year Storage (various)

    There was supposed to be a talk, or perhaps a breakout session, about the reasonableness of designing a storage system to last 100 years, but it kind of got lost in the shuffle. Given that many technological advances are likely to occur in the next century (duh) I think it’s critical that any system designed for such a timescale must also be designed to evolve over time, but that introduces a whole bunch of difficult problems. Dealing with multiple protocol versions is a major pain even in a very small network (been there, done that); is it even possible in a network of the size anticipated for OceanStore? Regenerating lost fragments that have been deliberately scattered all over the world is already a hard problem in OceanStore; how much harder is it to migrate fragments to a new format, hash type, or key length without disrupting anything? I think there’s a lot of brainstorming left to be done in this area.

  • Time Travel Data Store (various)

    Everywhere I go nowadays, whether it be to startups or to academic research labs, the same idea seems to keep coming up: a data store that lets you move backwards or forwards to any arbitrary point in time (not just to when you took the last snapshot). OceanStore could potentially act in this way, but there are pieces missing. Again, I think this is a very fruitful area for more research.