Teaser: erasure codes and SMB, shingled disks and CDMI, all with a side of snark.
Last week I went to SNIA’s Storage Developer Conference in Santa Clara. It was a bit of a change of pace for me, in a bunch of ways. For one thing, most of the conferences I go to tend to be heavily open-source oriented, whereas SNIA tends to be dominated by the closed-source Big Storage folks. I’m not judging anyone, but this does imply a different vocabulary and emphasis to get one’s point across. This was also a bit different because I was explicitly not representing my employer. I was originally invited to participate as a member of Stephen Foskett’s Tech Field Day roundtable broadcast live from the event, for which I am very grateful to SNIA and to Stephen. Once I was already set to attend, I ended up doing a BoF and even a talk on the filesystem track, not that I mind but it did make my role there a little hard to explain. Anyway, enough about all that. What was going on technologically?
The most notable thing at the conference was the prevalence of SMB over NFS. It’s hard to interpret this as a function of the audience, since the big NFS players are also to a large degree the big SNIA players and were there in force. NFS hardly showed up in any of the talks, though. By way of contrast, SMB was either the topic or at least mentioned in a great many talks, and there was a big SMB plugfest going on downstairs the whole time. I kept hearing that Codenomicon was causing some havoc, but people couldn’t say any more because I wasn’t actually part of the plugfest and it’s all under NDA to avoid spooking the peasantry. It’s important to note that finding bugs at a plugfest is a Very Good Thing, because whatever’s found and fixed there won’t be found and exploited in the wild. In any case, I really came away with a strong feeling that the fragmentation and territoriality among the NFSv4+ server vendors has finally taken its toll. Now that the Microsoft folks and the Samba folks and others are playing together more nicely, they seem well on the way to becoming the technology base of choice for that kind of thing. And no, I don’t feel threatened by that, because I see both of them as more front-end technologies that in fact work very well with a fully distributed back-end technology such as GlusterFS.
Another technology that seemed to get mentioned many times was object storage, and particularly CDMI (Cloud Data Management Initiative). With things like S3 and Swift already out there, I always thought it would be difficult for something like CDMI – a 224-page spec with 24 of them boilerplate before the overview – to gain any traction at all, but CDMI interfaces were mentioned prominently in several talks and by all accounts the interop demo (part of the cloud plugfest which was also running concurrent with SDC itself) was pretty impressive. The Cleversafe guys were there in force, as always, as were the Scality folks making some very ambitious claims. One of the big surprises for me is that GoDaddy is apparently planning to offer an object-storage service, following in the footsteps of Amazon and Rackspace and DreamHost. Speaking of DreamHost and their Ceph-powered DreamObjects, the RADOS layer of Ceph seems to have grown a new “class” abstraction to do the same sort of server-side filtering/transformation as we’ve considered doing in GlusterFS with translators. In any case, as we’ve seen with our own “UFO” object-storage interface, object storage just seems to be hot hot hot.
Another entity in the object-storage space is Microsoft, whose Azure Storage offers that paradigm along with several others (which is cool already). What’s even cooler is what they’ve done with erasure coding in that storage, and if I had to recommend one set of slides to peruse it would be that one. Basically the problem with erasure codes is that reading and especially repairing data can require talking to many more servers than would be the case with simple replication – to which erasure codes are preferable mostly because of their better storage efficiency. What the Azure guys did was to develop a new kind of erasure code that allows for more efficient repair, and even for a very flexible tradeoff of storage efficiency vs. repair cost (mirroring the tradeoffs that are already possible for storage efficiency vs. protection against N simultaneous disk failures). Very cool stuff, and they did a great job explaining it too.
The last thing I’ll mention is Garth Gibson’s talk about shingled disks. The idea there is that disk vendors are under immense pressure to deliver ever higher densities at very low cost. The rise of solid-state storage at the $/IOPS design point forces them even further toward the $/GB design point, and they’re reaching the point where – even with technologies like heat-assisted recording and patterned media – they just can’t write tracks that are any thinner – but they can read thinner tracks, and that’s where the shingled approach comes in. The idea is that writing each track (within a “band”) actually overwrites 2/3 or so of the previous track as well. Thus, that previous track is still readable but not rewritable without also rewriting the second track that overwrote it, and the third track that overwrote the second, and so on to the end of the band. Besides being terrible for performance, this creates a huge window where a failure in the middle of the process could lose data. A more logical approach would be to treat a shingled disk somewhat like a WORM drive, using a log-structured filesystem. Unfortunately, those have fallen into disrepair and disrepute as everyone has gone gaga over btree/COW filesystems, so some resurrection would be necessary. Also, it would be nice to have at least some easily rewritable space on a mostly shingled disk, even if it’s just to store log-cleaning state. Most people seemed to think you should just pair the shingled device with a smaller non-shingled device, but that doesn’t really solve the “have to guess the ratio” problem any more than one unshingled track at the end of each band (not per disk because of seek-time issues) and I know users would hate the logistical issues of paired devices. I really hope that we don’t end up with the highest-capacity kinds of disks being almost impossible to use effectively because you have to provision two kinds of storage and then deal with both sets of performance anomalies for a single volume.
There was certainly a lot of other stuff at the conference as well, including the inevitable bubble-inflating dose of “big data” hadoop-la, but those didn’t really interest me as much as the things above. So thanks to everyone for inviting me, or educating me, or just hanging out with me, and see you all next year.