The SBIC Syndrome

SBIC – pronounced “ess-bick” – stands for “Smartest Boy In Class”. SBICs are very common in the technical community, particularly in online communities; Slashdot is Mecca for SBICs. The typical SBIC is a tech worker, sometimes a developer but more often a sysadmin or support type. Frequent exposure to people who seem stupid or ignorant reinforces SBIC attitudes and behavior; frequent encounters with smarter or more experienced people tend to cure the disease. The central belief set of SBICs can be summed up as follows:

I am smarter than everyone else. If I can’t solve a problem, nobody can; anybody who claims to’ve found such a solution is a liar. Since I can understand any fact quickly, anything I can’t understand quickly is not a fact.

An example of how this attitude plays out is as follows:

  1. Somebody proposes a project to an SBIC.
  2. The SBIC notices some problem with how the project handles XXX.
  3. The SBIC spends a few minutes trying to solve the problem, and doesn’t come up with anything.
  4. According to the belief set described above, the SBIC therefore decides that no solution is possible.
  5. The SBIC infers that the project is doomed to failure, and is therefore a waste of time.
  6. The SBIC tries to “do everyone a favor” and keep them from wasting time, by criticizing and opposing the project.
  7. The SBIC won’t shut up about “the XXX problem” until a comprehensive solution is explained to him in excruciating detail.
  8. Meanwhile, work on every aspect of the project other than “the XXX problem” comes to a standstill.
  9. If XXX was a secondary or expendable feature, the SBIC’s prediction of doom becomes a self-fulfilling prophecy.

A slight variation of the above occurs when the SBIC sees a problem X, devises a solution Y which introduces problem Z, and then starts complaining about how the project has problem Z. It is not in the SBIC nature to consider the possibility that solutions other than Y exist. To them, Y and Z follow inevitably from X.

There’s another part of the SBIC mindset that provides a solution (yes, I realize there might be others) to dealing with them. It goes like this:

Because I’m so smart, my perspective and priorities are important, and you should place a high priority on addressing them.

Well…no. You’re not, they’re not, and I don’t feel like it. Fly away before you get swatted.

File-sharing Manifesto

Most of the filesharing systems out there nowadays are barely better than FTP, which is one of the very oldest protocols on the Internet. Sure, the underlying technology is a lot different, but functionally we don’t seem to’ve advanced very far. Think about it. Most of the filesharing code is based on a model of non-transparent standalone applications, whole-file upload and download, etc. The transparent veneer of search capability doesn’t count for a whole lot compared to Archie/Veronica or even grepping through the ubiquitous 00INDEX.TXT. I was grabbing recipes and programs and images just as productively via FTP twelve years ago as I can do today using Gnutella or Freenet. The only differences are the bandwidth and the content that’s available.

What I’d like to suggest is that we need to set our sights higher. FTP functionality isn’t worth striving for. The ultimate irony is seeing criticism of a project for not being innovative enough, from people who whose functional target is barely an improvement over FTP and who don’t even have that fully working yet. If you want to get serious about innovation, here are some ideas – first from a feature level, and then from a technical level.

  1. There absolutely must be a guarantee of data availability, between the time data is inserted and the time it is explicitly deleted or obsoleted. The system should never drop data during normal operation, and should endeavor to prevent such loss even in the face of common node/link failures.
  2. Directories and folders are nice. Flat namespaces are just so “twenty years ago” and yet, underneath the search veneer, that’s what most filesharing apps have. It should be possible to upload a source tree, for example, and have the filesharing network preserve its structure instead of having to use an archiving tool.
  3. Going one level deeper, I should be able to access part of a file without having to fetch the whole file.
  4. I should be able to update or overwrite a datum in place, at whatever granularity we’re using, instead of having to create a new datum with a new key and keeping track of what obsoletes which.
  5. Transparency is nice. Once I’ve found the aforementioned source tree, I should be able to mount it, browse it and use it just like I do with my local files. Yes, ultimately I’d even like to be able to do memory-mapped I/O on those files.

Note that I don’t address security/anonymity in this list. It’s not because those things aren’t important; au contraire, it’s because those issues are so important and so complicated that they’d double the length of this already-long article. I’m a strong believer in maintaining focus.

So, what technical directions are implied by the above? This is probably going to be even more controversial than the previous list, but here goes:

  1. The filesharing code needs to be implemented as an actual filesystem, not a separate application. On Windows, with whole-file access, an Explorer namespace extension would do. One implication here is that Java is not an option. Neither is Perl, neither is Python, etc. Get over it. I’m a Python lover myself, and I’ve actually prototyped a signifiant part of such a filesystem in user-space using Python myself, but I know that sooner or later it will have to turn into C and go into the kernel.
  2. We need directory (and possibly attribute) operations as well as file operations. Data location should be some combination of searching and directory traversal.
  3. Data storage, access, and transport needs to be block-centered, not file-centered. The blocks may well be identified as tree:file_path:offset, but for the most part each block is still moved, tracked, and acted upon throughout most of the system as an independent entity.
  4. Data caching throughout the network is absolutely necessary for performance. If we’re caching and we allow in-place modification, we need cache coherency. Been there, done that, c’est la vie.

Phew. That’s a lot of work, isn’t it? One might well ask if it’s worth it. The end result will be a lot less like the filesharing apps we know today than like a full-blown distributed filesystem. That’s kind of the point. We should be using advances in technology – and there have been quite a few – to target a high functional level, not a low one.

If somebody did all this (I’ve already contributed to one major piece, and am working on another), it opens up some interesting possibilities. Such a system would be useful not just for exchanging music and porn pix with your buddies, but also as an important piece of infrastructure for real applications. Why get involved in complex presence detection and messaging to communicate, when you can put information in a shared file? Right now there’s this weird split between local and distributed applications, with the former often using shared files and the latter using complex messaging. Clearly the application developers would prefer to use shared files, and the only reason they don’t do it for distributed stuff is because the infrastructure isn’t there. Either the functionality’s totally stone-age, or the performance and robustness are substandard. Give them a real way to share files without those limitations, and a whole bunch of things that are currently very difficult will become much less so.

Data Loss in Freenet

Some genius came up with the brilliant idea of putting an SQL interface on top of Freenet. I couldn’t help myself, and immediately made this post to show how crazy I thought the idea was. Ian Clarke, the author of Freenet, kept splitting hairs over the phrase “practically random” and we got into it a little bit. Later, in a separate subthread, I had occasion to provide a more detailed explanation of what I see as the issues involving data loss in Freenet. With luck, Mr. Clarke might take the opportunity to learn a bit about matching protocols to design goals, about the dangers of overhyping a project, and about learning from others’ experience instead of assuming he knows the most about everything.

Fixing the Internet

On the egroups “decentralization” mailing list, Todd Boyle posted this screed. There’s a lot of interesting stuff in it, but I considered fundamentally misguided. Here’s my reply explaining why.

Standing on the Shoulders of Giants

original article

If I have been able to see further, it was only because I stood on the shoulders of giants”.

That’s one of my favorite quotes, and coming from anyone else I would applaud it…but not from Linus. Why not? Because Linus has one of the worst “my farts smell better” attitudes I’ve ever seen. It’s well known that one of the *worst* ways to get an idea incorporated into the Linux kernel is to say that it’s been tried and found successful in some other OS. Linus, and the other senior Linux developers, seem to loathe the idea that someone else thought of something before they did, or – heaven forbid! – better than they did. The spiffy new Linux way of doing things – union mounts, kiobufs – is always assumed to be better than anyone else’s way of doing the same things just because it cam from Linux people.

Getting back to the topic, people need to read some of the exchanges between Linus and Andrew S Tanenbaum of MINIX fame. Does that look like proper acknowledgement of a debt owed to another for inspiration or ideas? No, Linus has one of the worst records out there of failing to thank the giants on whose shoulders he stands. For him of all people to throw that quote in someone else’s face is the very height of hypocrisy.

Hard-disk write caches

original article

The kernel can only ask the hard drive to flush the data to disk. The disk need not comply, despite returning a “yes I did” result.

That’s an important issue. I’ll try to provide a couple of answers.

how can the consumer really know what the drive decides to do?

Well, there are at least two ways:

  1. Turn off write caching.
  2. Set the “Force Unit Access” (FUA) bit on the Write command, if it’s a SCSI/FC disk.

SCSI gives you other options as well. For example, if you’re using tagged command queuing, you can set FUA only on the last command of a sequence (e.g. a transaction). That way, you can allow the disk or storage subsystem to do appropriate reordering, combining, etc. and you’ll still be sure that by the time that last command completes all the commands logically ahead of it (as specified by the tags) have completed as well. It’s tres cool, and it’s one of SCSI’s biggest benefits compared to IDE.

Tagged command queuing also comes in handy if you have to force write caching off – which BTW is common and not particularly difficult on either SCSI or IDE drives. Since you’re now forced to deal with full rotational latency, the importance of overlapping unrelated operations (by putting them on different queues) becomes even greater.

This stuff is not document on the box the hard drive comes in nor on the mfg web site.

Tsk tsk, that’s a shame. It’s pretty common knowledge among storage types, but still far from universal. Go look on comp.arch.storage and you’ll see a recurring pattern of people finding this out for the first time and sparking a brief flurry of posts by asking about it.

The problem with having the drive notify the host that a write has been fully destaged is that target-initiated communication (aside from reconnecting to service an earlier request) is poorly supported even in SCSI. Hell, it’s even hard to talk about it without tripping over the “initiator” (host) vs. “target” (disk) terminology. Most devices lack the capability to make requests in that direction, and most host adapters (not to mention drivers) lack support for receiving them. AEN was the least-implemented feature in SCSI.

There’s also a performance issue. Certainly you don’t want to be generating interrupts by having the disk call back for *every* request, but only for selected requests of particular interest. So now you need to add a flag to the CDB to indicate that a callback is required. You need to go through the whole nasty SCSI standards process to determine where the flag goes, how requests are identified in the callback, etc. Then you need every OS, driver, adapter, controller, etc. to add support for propagating the flag and handling the callback. Ugh.

It’s a great idea, really it is. It’s The Right Way(tm). But it’s just never going to happen in the IDE world, and it’s almost as unlikely in the SCSI/FC world. 1394 seems a little more amenable to this, but I have no idea whether it’s actually done (I doubt it) because even though I know they exist I’ve never actually seen a 1394 drive close up.

I hope all this helps shed some light on the subject.