Cache-friendly Memory Allocator

I was thinking about the code I’m writing, and thought of something that would be Nice To Have ™. I’m sure it exists somewhere, but I don’t know where, so maybe if I describe it people can help me find some references. What I’m thinking of is a cache- (and NUMA-)friendly memory allocator. Often, especially in a server program, you might be allocating/deallocating short-lived objects quite frequently (in my case it’s request objects). On a cache-coherent multiprocessor, it would be nice if the allocator would preferentially reuse space that had last been used on the same processor making the allocation request, to maximize cache warmth and minimize memory traffic. This would require a per-processor free pool that’s tried first, with a fallback either to a shared pool or to the “next” processor’s pool with processors arranged in a circle.

If this is an old idea, particularly if you know of an actual implementation, please let me know. If it’s a new idea, you’re welcome. ;-)

Happy Birthday to Me

It’s my birthday today. My age is a prime number; the last time that happened was six years ago, and it won’t happen again for another four. How old am I?

Canard of the Day

Some people need a broader perspective. Consider this quote from Joel:

The claim that SOAP is bad because the wire format is ugly and hard is like claiming nobody should use Pentiums because their instruction sets are so much more complicated than the instruction set for the 8086.

So you need something to contrast with a Pentium instruction set and you come up with…the 8086? Yeah. That’s like looking for a contrast to Budweiser and coming up with Bud Light. Far from making Joel’s point or inspiring confidence in the breadth of his knowledge, the example leads readers – this reader, anyway – to wonder just how long Joel would sit in front of a freshly-installed Linux box wondering where the “Start” button is. Those who make a habit of preaching to others about technology should IMO know more than one OS, one instruction set, one type of programming. I’m beginning to wonder whether that’s true for Joel.

BTW, the fact that the x86 instruction set is an ugly mess is a problem, even though we have compilers to do most of the nasty parts for us. As an instruction set gets messier, it becomes exponentially more difficult for compilers to produce code that both behaves correctly (even in the more obscure cases) and performs well. Perhaps even worse, compiler writers who have to deal with messy instruction sets don’t have much time left over to implement generic higher-level optimizations or add new features. There are actually good arguments to be made for getting away from the Pentium instruction set; even Intel no longer executes that instruction set internally in either their current processors or the upcoming Itanic.

Surfing Beats Working

OK, I know I’ve made disparaging comments in the past about people whose weblogs are just bookmark lists formatted differently, but I don’t have anything original to say right now and I just found one of the best personal web pages ever: Guillaume Dargaud. Now, when I say good I certainly don’t mean the presentation, which is so bad as to be outright offensive – poor color/layout choices, fade effects, rollovers, animated GIFs. If you can think of a “cool for about two seconds” webtoy, he’s probably used it. I’m talking about the content, which includes many – and I do mean many – pictures of Antarctica and Tasmania and New Zealand and climbing everywhere, plus a substantial archive of high-quality humor. Definitely worth a browse, but you might want to set up some filters to avoid some of the visual nastiness.

Using Python for Performance

David McCusker is at it again (last reference). In a response to mail from Zooko about Python, he included this comment:

I could probably use Python, but it merely doesn’t fit my current needs.
I do a lot of work on low level runtime performance, and need access.
I want my future high level glue language to let me play with low stuff.

IMO that’s a very poor excuse for not using Python. As many of my readers know, I also work on a lot of performance-critical stuff. A lot of what I do is at an even lower level than what David does, and I’m still comfortable using Python for a lot of my development. One of the nicest things about Python is that it has good support for compiled dynamically-loaded extension modules. I’ve heard that Ruby is even better in that regard, and it might be true, but Python’s still pretty good. Consider my current project as an example of how to use Python effectively for performance-critical work. The original prototype consisted of Python higher-level logic on top of several extension modules (written in C) to handle things like interfaces to kernel components. Over time, I’ve replaced each Python module in turn with a pair of C modules: one to do the actual work, and another to provide a Python interface to the first. This method provides two benefits:

  • At each stage, the as-yet-unconverted Python modules still work just as they always did, unaware that the modules they call have changed. There is always a fully functional version of the whole program.
  • Having Python-interface modules at all levels of the system makes it extremely easy to write complex and powerful unit/regression tests.

By the time I’m done, which should be very soon, I’ll have a complete native-code version of my project that has benefited greatly from the cognitive/experimental advantages of using a higher-level language in the early stages, plus the testing advantages from the conversion process. It’s a better product, completed more quickly, than would be the case if I’d written it all in a lower-level language from day one. The idea that Python, or languages like it, do not have a place in performance-critical projects seems totally wrong to me.

Maybe David meant something else by being able to play with low-level stuff from a high-level language. Maybe he wants to do it all in a single language, instead of integrating code written in two different languages. If so, remind me to dig up one of my many rants about the broken “one size fits all” meme.

Moderation Systems

There’s a fairly interesting discussion on about moderation systems in an online forum. You can go read it if you like, but I’ll try to summarize my own views here:

  • An effective moderation system depends on strong identity/authentication to prevent astroturfing.
  • Moderation activities should be visible, because invisible activities (that affect others’ reputation) are the moderation-abuser’s best friend.
  • There should be a limit to how much one person can affect another’s reputation, to prevent both “karma wars” (two people modding each other down) and “mutual admiration societies” (two people modding each other up) from getting out of control.

My suggestion, therefore, is that there be a cap – two caps, actually – on how many positive and negative moderations a person A can make of another person B’s posts, per time period, and “excess moderations” cause all moderations by A of B in the same direction to be “scaled down” so that the total influence equals the limit. For example, let’s say the limit is 5. If person A moderates 3 of person B’s posts down, nothing in particular happens. However, if s/he does so 7 times, each down-moderation is scaled down to a value of 5/7 so that the total value is 5. A corresponding – but perhaps different – limit would also be enforced for up-moderations. This would effectively limit the influence that A could have on the total rating of B’s posts, which constitutes B’s reputation in such a system.

I’d be interested in hearing more about this idea from people who know more about this stuff. Feel free to send email or leave a note on PlatSpot.

Filesystem Fragmentation

Every once in a while I search for my name on Google (yeah, like you need a link to that). It’s not so much to find others’ references to myself, since most of the hits are usually to stuff that I wrote myself, but every once in a while such a reference does pop up. Yesterday, I found this one:

There’s a long thread on MacinTouch about OS X and fragmentation. Worrying about fragmentation is so Personal Computer. A few people in that thread set the whiners straight about just how traditional unix file systems handle file writes so as to minimize fragmentation although I’m not entirely sure that’s how HFS+ works. Jeff Darcy would be sure to know.

Well, Steve, as gratifying as I find your confidence in me, I have to admit that I don’t know much about how HFS+ works. Truth be told, most of my Mac work was done before HFS+ even existed. Damn, I’m old.

However, there is something in your comment that I feel deserves a reply. UNIX filesystems tend to do a lot to prevent fragmentation and generally reduce head motion – preallocation, cylinder groups, blah blah blah – but it does still occur and most filesystems don’t actually do all that much to undo it once it exists. Consider this quote from one of IBM’s many excellent articles about filesystems:

ext2 filesystems take a long time to exhibit signs of fragmentation. However, I would argue that fragmentation is still a big problem, because although ext2 does not get fragmented easily, fragmentation is a one-way, cumulative process. That is, while ext2 fragments slowly, it cannot defragment itself. In other words, any often-modified ext2 filesystem will gradually get more and more fragmented, and thus slower. Even worse, there are no production-quality ext2 filesystem defragmenting programs currently available.

There are defragmentation tools for ext2, but a lot of “people who should know” don’t seem to trust them very much. You’ll often find a suggestion to dump/restore a filesystem periodically to reduce fragmentation. The same story applies in general to FFS, UFS, etc. I don’t know about JFS/XFS/Reiserfs/etc. but I will say this: after-the-fact defragmentation functionality integrated into the filesystem isn’t a whole lot different than the same functionality implemented as a separate program. In some ways it’s worse, since it significantly increases the complexity, size, and potential for error in a kernel component where those are all much worse problems than in user space.

In conclusion, UNIX filesystems are not immune to fragmentation, though they are more resistant to it than FAT (NTFS is really much more like a UNIX filesystem than anything else, in practically all respects). Furthermore, the tools for fixing fragmentation on UNIX are actually worse than their DOS/Windows counterparts. However, worrying about fragmentation is still “so Personal Computer” because real computers are attached to real RAID storage systems with huge caches and internal optimization so that host-based filesystems’ notions of physical location and fragmentation are all wrong anyway. Physical placement has become a storage-side problem, not a host-side problem, and that will be even more the case when the next generation of “storage virtualization” cruft becomes available.

Triangles Everywhere

A lot of people on the net have probably heard of “triangles” like these:

  • Fast, cheap, reliable…pick two.
  • Beauty * brains * availability = a constant

Pursuant to a conversation with Bram Cohen, who probably doesn’t realize that I think he’s a great guy, I thought of another one:

  • Work on cutting-edge stuff.
  • Set your own terms and deadlines.
  • Draw a nice steady paycheck.

As always, there’s a tradeoff involved. For example, I draw a very nice paycheck and I work on stuff that’s pretty close to the cutting edge, but in return I have ceded a lot of control over a lot of stuff like how my ideas are used. A lot of academic folks are even more aggressively on the cutting edge and have a little bit more “freedom of exploration” but have given up something in the paycheck department. Open-source heroes tend to do really well in the terms-and-deadlines department but also do poorly in the paycheck department and often aren’t nearly as close to the cutting edge as they think. These are all valid choices. The point, which I tried to make recently but which had not fully congealed in my mind yet, is that for most of us some sort of tradeoff is necessary. Sure, the triangle model breaks down in some exceptional cases; Linus Torvalds, for example, seems to do pretty well by all three criteria. However, such “dream jobs” are extremely rare, and they’re not exactly handed out like door prizes. Usually, they’re the result of both hard work and good luck, having established the right reputation with the right people under the right circumstances. Usually that reputation is a combination of a proven track record for turning wacky ideas into reality, plus some reasonable degree of business understanding, plus not being a total nut job. The right people might be a CEO, a lab director, or a VC. The right circumstances would include a market for your particular idea, at the time it’s likely to be complete. (That last part is often missed; I’ve seen many great ideas fail because they were reached tangible form too late, and almost as many because they were too early.) If you bring all of these things together, maybe you can get a dream job, but the other 999,999 out of a million people will have to decide where in the triangle they’re most comfortable being.

User-Hostile Operating Systems

I spent a good part of yesterday installing Linux (Debian, if you care) on my relatively-new home machine. What a pain in the ass. I’m sorry, folks, but anyone who says that installing/configuring Linux is no harder than installing/configuring Windows is a liar. To be fair, most of the problems I had were XFree86 rather than Linux itself, but Joe Average wouldn’t even be able to make the distinction and shouldn’t have to. At one point I had to uninstall XFree86 entirely and start all over again, then I had to lie and tell X that my video card is a GeForce 2 GTS even though it’s really a GeForce 2 Ti, and manually hack the XF86Config file to get the right video modes, etc. It’s nothing that I personally have too much trouble with, but it’s totally unacceptable from a normal consumer’s point of view. When Linux can recognize some of the world’s most common network and video cards and make sure the proper drivers get installed and loaded, on its own without extensive manual intervention, it might be ready for the consumer desktop. You’d think that at least some of the technically-minded Linux advocates who spend so much time creating new flavors of window-manager eye candy would take just a little time to bring Linux up to Microsoft levels in some of the more important user-friendliness areas. If I install a new card then the next time Windows (any flavor) boots I’ll get a nice little dialog helping me find/install/configure the appropriate drivers. The next time Linux boots I get squat. Go ahead, tell me again how the two are comparable.

(Very Slightly) Enhanced LoTR Test

In my ongoing work-avoidance project, I added some pictures to my LoTR personality test. Maybe some day I’ll add some more characters, and better descriptions, and better questions, and I’ll put the questions/answers/etc. into a database instead of a lame-o text file…nahhh. That wouldn’t be work avoidance, now would it?