Verizon’s DNS Sucks

A while ago, Verizon started offering their new Fios – their new fiber-to-the-home service – for less than we had been paying for DSL, so we signed up. The service really is pretty zippy. One one occasion recently I was able to sustain over 600KB/s (4.8Mb/s) for the duration of a 32MB download. However, I quickly noticed that pages were often slow to load because of the time it took to translate host names into IP addresses. Not suprisingly, others have also noticed this deficiency in Verizon’s network. Oddly, their public servers (4.2.2.1 through 4.2.2.5) are pretty well respected and often suggested as an alternative for people who are having problems with other ISPs’ name servers, but those aren’t the ones Verizon tells your computer to use. Well, I can override what Verizon tells my computer – in fact my whole home network – to use. I have in fact done so, and I can report that surfing has improved considerably as a result. I don’t know why Verizon points subscribers’ machines toward servers that significantly degrade those subscribers’ net experience and thus cast their product in a bad light, when they themselves already have far better servers available, but that seems to be what they do.

By the way, for anyone else considering a similar trick: be very careful whose name servers you use. An unscrupulous person could easily set up a name server that would, for example, resolve your online banking system’s host name to an IP address that is actually their own password-stealing machine. Only use name servers that you have very good reason to trust.

Fast Method Dispatch

Yesterday, Joel Spolsky wrote about how Ruby is slow. One of his key theories is summed up by the following quote.

Without knowing much about the implementation of Ruby, I would guess that the biggest issue is around late binding and especially duck typing, which prevents type inference or strong typing, which means that function calls will always be slow because you can never get something compiled down to the point where a function call is just a single CALL instruction (on x86)… you always have to be exploring the object, possibly even scanning a hash table to find the function you want to call. Calling methods on objects is really, really, really, really common, especially in pure OO languages, and that bit of the code needs to get down to at least a CALL indirect on the CPU for Ruby to offer the same performance as languages where the type of the object can be determined at compile time and therefore the instruction where you’re jumping to can be gotten either at compile time (like in C) or with a single indirection through a single vtable (like in C++).

“Duck typing” is a bit of a sore spot for a platypus, for which LayEgg() works but FlapWings() doesn’t, but Joel’s usually wrong about stuff (like thinking that good developers care more about Aeron chairs than a good salary as though they don’t know their market value and can’t buy their own darn chairs if they’re paid appropriately) so I didn’t think much of it. Then he posted a link to an article about how to make method dispatch faster and that piqued my interest. So, naturally enough, I decided to write a little test program. Basically it tries a bunch of different kinds of method dispatch, doing many method calls and using the cycle counter for timings. The dummy method being called doesn’t really do anything, but does have a standard prologue and epilogue. Here are some results.

  • Direct call (unconditional always-taken jump). This was just to get a baseline, and yielded about 10 cycles per iteration.
  • Dispatch table containing just the target address (indirect always-taken jump). This caused the time to go up to 14 cycles per iteration.
  • Dispatch table containing a JMP instruction to the target (two always-taken jumps). Back down to 12 cycles.
  • Avi Bryant’s method (unconditional always-taken jump, compare, conditional jump). Back up to 14 cycles. Yes, this was worse than the previous well-known and flexible method.
  • Conditional before the jump (compare, conditional jump). Either 12 or 14 cycles, depending on whether it was coded so that the positive or negative branch was actually taken each time.

Observation one is that Avi’s trick doesn’t actually seem beneficial, and that’s before you consider that it relies on compiling the target function in a special way. Compiling the caller differently is at least as good, and the second dispatch-table method is even better. Observation two is that none of this really matters that much. Two cycles? Give me a break. I’ve done plenty of code that really did have to count cycles, and that kind of thing does have its place, but this isn’t it. For any higher-level language, many other internal aspects of how the compiler and run-time libraries work will be much more important than the method-dispatch differences observed so far.

Observation three is the killer. I don’t think Avi’s technique really works for true duck typing. I started to realize this as I was writing the test program, and I started wondering what condition the target routine should test. Avi says it’s just a CMP, but comparing what to what? In a truly duck-typed language the target function is going to have a pointer to an object, and then it has to figure out whether it’s the right method for that object. Even in a statically typed OO language that’s going to involve a bit of pointer chasing to find the right vtbl and so on. In a duck-typed language, where the actual object might have absolutely no inheritance relationship with the class on which the guessed method is defined, it’s trickier. At the very least you’ll have to find the hash table, find the index that would be used if this were the correct method, and do two compares – one for the method-name pointers and one for the method-body pointers. This is the price you pay for the flexibility of duck typing, but it still doesn’t validate Joel’s theory. Even with two compares, method dispatch is not likely to be the problem here.

Contention and Communication

In a conversation at work today, I happened to hit upon one of those statements that’s very simple in itself but represents an entry point into a fairly complex and cohesive set of ideas. My favorite example of this is Ockham’s Razorentities should not be multiplied beyond necessity. This is usually thought of as a principle of logic, but it can also be applied to management or software engineering or aesthetics. My pale attempt at achieving such economy of phrase is this:

Contention cannot be resolved without communication.

Yeah, I know, it’s nowhere near as powerful, but it would probably sound really cool in Latin. The original context, as befits a work conversation, was technical, but there’s a sociopolitical aspect to it as well. I’ll delve into both after the break.