At the end of my original actor-model post, I suggested that I might post some example code to show how it works. More importantly, I want to show how it avoids some common problems that arise in the lock-based model, and makes others at least a little bit more tractable. Before we dive in, then, I need to explain some of what those problems are and how they happen.
All of a sudden, everybody is talking about concurrent programming. The driving factor, of course, is the fact that multi-core processors have moved beyond the server room, into the machines people use for web surfing and games. Whether that’s a good kind of evolution is a question best left for another time. Right now, much of the debate is over how to make it possible for programmers to take full advantage of all that extra processing power. Because of what’s driving it, most of this conversation is focused on systems where all processors share equal access to a single memory space. In fact, neither the access times nor the memory layout are always completely uniform, but that’s not really important here. In the past, I’ve written quite a bit about programming in this kind of environment. It’s probably why many of you are here, but today I’ll be going in a slightly different direction. I say “slightly” because I won’t be going to the other extreme – internet scale with dozens-of-millisecond latencies and high node/link failure rates – either. For now, I’ll be somewhere in between – clusters of machines with separate memory, connected with some kind of modern interconnect such as InfiniBand or 10GbE.
Puzzle questions are useless in interviews. Yes, I know there’s a real cult around them, and I don’t deny that they ever have any value, but the conditions for their successful use occur so rarely that it’s not worth qualifying that original statement. In the vast majority of cases, puzzle questions are a way for incompetent interviewers to kill time instead of getting real information. To see why, let’s consider what has to happen for a puzzle question to provide actual value.
First, you have to pick a good question. Actually you have to pick more than one, in case the candidate already knows your first one. After all, there are whole sites devoted to collecting these, and people who have practically made a career out of solving puzzles instead of doing real work. For similar reasons, trick questions aren’t very useful either; you don’t learn much about a candidate who just gives you the answer because they’ve seen it before. The question has to be relatively straightforward, but difficult or complex enough that you’ll learn something from watching the candidate solve it . . . but not so difficult or complex that it consumes the entire interview. The question should be one for which the answer, or the path to the answer, is somewhat domain-relevant, but not so domain-specific that only you and your office buddies are likely to know it off the tops of your heads. Do you see yet how difficult it is to find a truly useful question?
Now you have to administer the question. Puzzle questions can’t be about testing technical knowledge. If you want to test technical knowledge, ask technical questions. I recently encountered a high-level consultant who didn’t even know what consistent hashing is, in a context where it’s pretty much a standard technique. Had it been an interview, I could have learned more by asking a candidate to define or explain consistent hashing than by asking any number of puzzle questions. Every field has such key terms or concepts. Figure out what yours are, and ask about them directly instead of playing games. Back to our point, the purpose of a puzzle question must be something else. Maybe it’s to test how the candidate combines ideas, or how well they work with someone else toward a solution. How many engineers are really able to evaluate such things? How many are able to remove themselves from their own preconceived idea of the solution method or result to conduct such an evaluation without injecting their own bias into the process? That’s exactly the kind of thing we’re known to do poorly. Someone with specific training in psychology or management can do it right, but the average engineer without such training has almost no chance.
So, by some miracle, you’ve been able to select a good question and administer it well. Now you have to evaluate the response. The question represents a subject you know well, that you hope the candidate doesn’t know as well, and you’re asking them to perform on the spot with you looking on critically. How many of us (engineers) are comfortable getting up at a whiteboard and doing even the most familiar things in front of strangers under time pressure? How many specifically write code under those conditions (since coding questions are the most common sub-genre)? Most take breaks to check the manual for the specifics of a call, go back and rewrite portions three times, insert five syntax errors in even a trivial code fragment, etc. Now, how many of us are capable of accounting for the fact that somebody else’s “inferior” performance is the result of something other than their being less intelligent than ourselves? Again, that’s a notorious weakness. In reality, most of us will systematically underestimate the candidate’s ability relative to our own, rejecting perfectly good programmers because they weren’t good trick ponies as well.
So, with all the pitfalls in selecting a question, administering it, and evaluating the results, how often do you think it happens that the puzzle question provides good value for the time spent? Almost never. Yes, it could happen, but you’re just about as likely to get struck by lightning. Counting on either event would be a poor basis for something as important as a hiring decision.
This being the internet, I’m sure someone will try to claim I’m just venting because I flubbed a puzzle question. Wrong. As far as I know, my performance on such questions has never prevented me from getting to the next round or getting an offer, and has often contributed positively to such outcomes. The one person who asked such questions this time around might well have been among those exceptional few who can actually avoid the pitfalls I’ve mentioned. I’m writing this more out of pique at my colleagues who I’ve seen waste valuable time with this silly gamesmanship, who left our group chronically understaffed due to the impossibility of anyone passing such a gauntlet. In consequence, I’ll offer this piece of advice not for interviewers but for candidates: any time an interviewer relies on a puzzle question, take the opportunity to re-evaluate whether they are good enough for you. The odds go down with every puzzle question you’re asked.
I kind of promised – myself, anyway, and perhaps others – that I’d write some of this down once I’d gained some distance from it. Now that I’m back from the Big Michigan Trip, it seems like a good time.
The market really is pretty bad. I’m lucky that over the years I’ve met a bunch of people who are a lot better at this networking thing than I am, and worked in some areas that are pretty lively right now, but I know a lot of people who are having a harder time. It really is luck, by the way. As I get older it seems like all those contacts become more valuable, but it’s not the contacts I make now that matter. It’s the ones I made ten years ago, when I had no way of knowing, when I was just trying to do the right thing helping and talking to others. Maybe that’s more karma than luck, but either way it’s not skill or planning on my part. I know too many people who are just as qualified, who have tried just as hard, who are having a tougher time of it.
Here are some other random observations.
- Pre-interviews, whether on the phone or over coffee or in person, seem almost de rigeur nowadays. Sometimes there will even be two, before you even get to the real first round.
- There are more rounds, spaced further apart, than there used to be. There are also more delays to check references, more delays constructing an offer, etc.
- Many companies are posting jobs that, on closer inspection, turn out not to be real. Personally I find this rather scummy, and it’s no surprise that a certain three-letter company is “leading to the bottom” in this area.
- The Boston area is less of a tech-startup hub than it used to be. In my own specialties, I found a lot of smaller companies doing stuff in Silly-Con Valley, in Seattle, in Austin, all over the place, but relatively few here. The work’s being done; it’s just being done within bigger companies. Between that, and news of VCs pulling out or shutting down, I seriously wonder whether we’ll continue to be the #2 area for this kind of thing. Maybe we’ve lost that status already.
- You’re still better off avoiding recruiters if you can. I know there are some good people working in the field, several of whom have been extremely helpful to me in the past, but I didn’t get one lead that way this time around. In fact I got -1, because one guy deliberately poisoned a likely opportunity. He’d mentioned a company five years ago, done nothing since, but still felt entitled to a fee even if the company later reached out to me on their own initiative so after he found out he triggered their “drop candidates if there’s any question about referral fees” policy. I won’t give the name here but, y’know, I might give an honest answer if anyone asks me for it while we’re having a beer together or something. I might also name some people I think are better. There’s that karma thing again.
Join me tomorrow, when I take on another job-hunt-related subject: puzzle questions.
Probably only a very few of my readers remember these.
Each of those black boxes is an enclosure containing two RLX Technologies blade computers. Everybody knows about blade computing now, but back in 2001 these Crusoe-based babies were the whole game and Chris Hipp was the guy who made them happen. His insights about power efficiency and density and ease of management for large “farms” of small machines changed the industry, and had a very specific echo in SiCortex. Besides that, Chris was a great guy to work with and very generous about letting even someone like me get their hands on what must have been scarce prototypes. His sudden passing at 49 is a loss to computing, and to the world in general.
Recently I found two egregious examples of homophones appearing in serious documents. The one in a spec used “undo” where “undue” was meant; I don’t remember the other one. As I was looking for something to explain the difference between homonyms, homophones, and so on, I found a funny story based on examples.
It felt like I had a pistil pointed at my aye browse. No, it felt like he was pointing a canon at me. Even if I ducked quickly, I doubted the missal would have mist.
The whether was good, good enough for a pyknic, so I putt on my shoo and went outside for some air. A large mousse walked by, causing a minor toxin and forcing a frightened creek from my friend.
Of course, no post about this kind of thing would be complete without a link to Ladle Rat Rotten Hut…
WANTS PAWN TERM DARE WORSTED LADLE GULL HOE LIFT wetter murder inner ladle cordage honor itch offer lodge, dock, florist. Disk ladle gull orphan worry Putty ladle rat cluck wetter ladle rat hut, an fur disk raisin pimple colder Ladle Rat Rotten Hut.
…or to Staying Positive.
I awoke rested and comfortable this morning, having suffered from somnia these past few weeks. Dreams of requited love still in my head, wishing they didn’t stretch to finity. I spent an ordinate amount of time in bed before getting on with the day. Looking in the mirror I noticed I was sheveled and ready to get started.
Astute readers might have noticed that my job-hunt post is no longer being forced to the top of the page, and guessed at the reason. I’ll cut right to the chase: as of today, I’ve accepted a job at Red Hat to work on a cloud filesystem. What is a cloud filesystem? Well, that’s the fun part. I have some ideas, other people have some ideas, and part of what Red Hat will be paying me to do is to collect those ideas into a coherent definition. As I see it, the approximate order of priorities is going to be something like this.
- Bring the cloud down to earth. “Cloud” is a much-used but ill-defined term, and is often used in ways that range from the overenthusiastic to the downright misleading. Before I can do anything effective in the cloud space, I need to distance myself from some of the hype. I’m not going to push “cloud” as a brand for its own sake. I’m going to do things that provide concrete and measurable value, for which “cloud” just happens to be an apt and concise label.
- Gather a “community of interest” consisting of both users and developers, both within and outside of Red Hat, to define some of the possible things that a “cloud filesystem” might mean. There are probably many correct answers, which is fine, but the set can’t be unbounded.
- Refine the definition(s) into a loose set of requirements.
- Pick a technology base, decide what existing widgets can be used and which need to be developed.
- Implement, integrate, test, hammer everything into product quality.
Sorry, couldn’t resist that last one. If you don’t recognize the meme, don’t worry about it. Obviously, even though this will be a collaborative process, I already have some opinions about definitions and appropriate technologies and I might as well share them. Obviously, anything that calls itself a cloud filesystem will be part of a more general cloud ecosystem. It must also exhibit the key cloud characteristics of distribution, elasticity, and multi-tenancy. Does this mean a storage cloud within a data center, across data centers, on a user’s desktop, or any combination of these? I don’t know, I don’t pretend to know, and I hope other people will be willing to engage in a dialogue on the issue. I sort of suspect the answer will be all of the above, and I have a picture in my head of an infrastructure that will support all simultaneously, but it’s still gelling so I’ll keep most of it to myself for now. The one thing I will say is that I think one important technological base for a cloud filesystem will be a current parallel filesystem (or other data store, but it’s only a piece and that’s a provisional opinion anyway based on the thought that there will be enough work to do without having to reinvent that particularly challenging wheel.
I’m sure some people think that all sounds pretty exciting, and others think it’s mind-numbingly boring. If you’re in the first category, I invite you to share your thoughts and/or to join me at Red Hat’s upcoming Open Source Cloud Computing Forum on July 22. I won’t even have started yet, technically, but I plan to attend anyway because I’m pretty excited about all of this and it seems like a good way to get a head start.
On a more personal note, before I go, I’d like to say something else. This has been my first all-out job hunt in almost twenty years. I’ve been extremely lucky – and I don’t for a moment pretend that it’s anything but luck – that I know people who are much better at networking than I am, and who have been able to help in my search. To all of the people who brought me in for interviews, acted as references, or even just shared tips and encouragement: thank you. I know I’m lousy at following up on these sorts of things, I know at least two groups of people will be disappointed or even offended by how I’ve handled things and/or how they’ve turned out, so where appropriate I offer apologies as well, but mostly I offer my thanks. Computing is a much more social profession than most people think, and it’s the people that matter. If there’s any way I can share some of my good fortune with others, just let me know.
One of the luxuries that I’ve gotten used to at some of the places I’ve worked is being surrounded by people who understand scalability. In my recent job hunt and other between-job activities, I’ve been reminded that a great many people in this industry lack such understanding. So, here’s the thing. If you draw a graph of a system’s component (e.g. server) count on the x axis, and aggregate performance on the y axis, then
Scalability is about the slope, not the height.
The line for a scalable system will have a positive slope throughout the range of interest. By contrast, the line for a non-scalable system will level off or even turn downward as contention effects outweigh parallelism. Note that a non-scalable system might still outperform a scalable system throughout that range of interest, if the scalable system has poor per-unit performance. Scalability is not the same as high performance; it enables higher performance, but does not guarantee it.
Similarly, building a scalable system might also increase single-request latency or decrease single-stream bandwidth due to higher levels of complexity or indirection. It would be wrong, though, to dismiss a scalable design based on such concerns. If low single-request latency or high single-stream bandwidth are hard requirements with specific numbers attached, then the more scalable system might not suit that particular purpose, but in the majority of cases it’s the aggregate requests or bytes per second that matter most so it’s a good tradeoff. The key to scalability is enabling the addition of more servers, more pipes, more disks, more widgets, not in making any one server or pipe etc. faster. Can you make better use of a network by using RDMA instead of messaging? Sure, that’s nice, it might even be all that’s needed to reach some people’s goals, but it’s a complete no-op where scalability is concerned. Ditto for “parallel” filesystems that only make data access parallel but do nothing to address the metadata bottleneck.
Scaling – more precisely what the true cognoscenti would recognize as horizontal scaling – across all parts of a system is an important key to performance in scientific and enterprise computing, in the grid and the cloud. That’s why all the biggest systems in the world, from Google and Amazon to Roadrunner and Jaguar, work that way. Anybody who doesn’t grasp that, in fact anyone who hasn’t internalized it until it’s instinct, is not qualified to be designing or writing about modern large computer systems. (Ditto, by the way, for anybody who thinks a hand-wave about their favorite protocol possibly allowing such scalability is the same as that protocol and implementation explicitly supporting it. Such people are frauds, to put it nicely.)
I’ll probably be writing more about scaling issues for a while, for reasons that will soon become apparent. Watch this space.