I was thinking about the code I’m writing, and thought of something that would be Nice To Have ™. I’m sure it exists somewhere, but I don’t know where, so maybe if I describe it people can help me find some references. What I’m thinking of is a cache- (and NUMA-)friendly memory allocator. Often, especially in a server program, you might be allocating/deallocating short-lived objects quite frequently (in my case it’s request objects). On a cache-coherent multiprocessor, it would be nice if the allocator would preferentially reuse space that had last been used on the same processor making the allocation request, to maximize cache warmth and minimize memory traffic. This would require a per-processor free pool that’s tried first, with a fallback either to a shared pool or to the “next” processor’s pool with processors arranged in a circle.

If this is an old idea, particularly if you know of an actual implementation, please let me know. If it’s a new idea, you’re welcome. ;-)