original article

As to the high end… Fiberchannel is a step forward, but not enough. Forget all these special purpose buses anyway… my suggestion would be to put a gigabit ethernet interface and an IP stack directly in the drive.

IP is a poor match for storage needs, IMO. TCP in particular was designed – and designed rather well – for the high-latency small-packet environment of the Internet, but storage is a low-latency large-packet world. It’s also a world where the hardware must cooperate in ensuring a high level of data integrity, where robust and efficient buffer management is critical, etc. etc. etc. Even on cost, the equation does not clearly favor storage over IP. Sure, you get to use all of your familiar IP networking gear, but it will need to be upgraded to support various storage-related features already present in FC gear. Even on the controller end, do you really think a GigE interface plus an embedded IP stack is easier or cheaper to incorporate into a controller design than FC? I could go on, but I hope you get the point. “One size fits all” is a bankrupt philosophy. Let IP continue to be designed to suit traditional-networking needs, and for storage use something designed to suit storage needs.

Better to run something like GFS directly on the drive.

No, not better at all. Who wants the drive to be a bottleneck or SPOF? The whole point of something like GFS is to avoid those problems via distribution. Putting an IP stack on the drive is bad enough, and now you want to put a multiple-accessor filesystem on it? Dream on. People used to put things like networking stacks and filesystems on separate devices, because main processors were so wimpy, but they stopped doing that more than a decade ago. For a reason.

huge RAID arrays with one smart control node (like NetApps, etc)

NetApp doesn’t make disk arrays. If you look at the people who do make high-end disk arrays, you’ll see that they have far more than one brain. A big EMC, IBM, or Hitachi disk array is actually a very powerful multiprocessing computer in its own right, that just happens to be dedicated to the task of handling storage.

one drive per brain, a full computer in each drive, each drive a full node on the network

…at which point you’re back to distributed systems as they exist today, wondering how to connect each of those single brains to its single drive with a non-proprietary interface. Going around in circles like that doesn’t seem very productive to me.