Wannabe of the Month: Skylable

Every month or two, someone comes along and claims to be the new Best Thing Ever in distributed file storage. More often than not, it's just another programmer who recently discovered things like consistent hashing and replication, then slapped together another HTTP object store because that's what people nowadays do instead of writing their own LISP/Forth interpreter or "make" replacement. There's nothing wrong with the exercise itself, of course. It's a great learning experience, and it's how real projects get started. For example, LeoFS might not really be "the leading DFS" as they claim, but it certainly a serious effort that I'm watching with interest. What gets my goat is always the grandiose claims, often made in the form of comparisons between real production-level file systems like GlusterFS and things that are neither production-level nor file systems.

This month's example is Skylable, which tried to take advantage of the publicity around yesterday's big announcement to pimp their own spare-time project. At first they just tried to position themselves as a competitor to GlusterFS and Ceph when they're clearly not. I tried, as neutrally as I could, to point out that it's not a valid comparison. They didn't take the hint. Instead, @tkojm decided to double down.

Skylable SX beats both Ceph & Gluster in terms of security, code quality, ease of use and robustness. Cheers.

https://twitter.com/tkojm/status/461833887229177856

OK, game on. Such claims really piss me off, not because they're made against my own project but because they're disrespectful to every project in the same space. For example, Tahoe-LAFS plays in exactly this space, and they actually know what they're doing when it comes to security. Making competitive claims that are not only unaccompanied by one shred of evidence but clearly false to anyone with even the most cursory knowledge of the competitive landscape is outright dishonest. The Skylable folks have practically invited more serious comparisons, so I'm going to give them what they asked for and they're not going to like it. Maybe that will keep the next tyro from making the same mistake.

Before I go on, I should mention that this post has nothing to do with Red Hat or the Gluster community. No time or equipment from either was used to test the Skylable code or write up the results. This is not big bad Red Hat picking on a smaller competitor. This is one guy (me), on his own time, trying to find the truth behind some very ambitious claims.

Let's start with ease of use. Here are the steps to install GlusterFS, set up a two-way replicated volume, and mount it on a client.

  • yum install glusterfs... (or equivalent for other distros)

  • /etc/init.d/glusterd start

  • gluster peer probe server2 (from server1)

  • gluster volume create myvol replica 2 server1:/path server2:/path

  • gluster volume start myvol

  • mount -t glusterfs server1:/myvol /wherever (from client)

What's the equivalent for Skylable? Well, you start by downloading, configuring, and building from source. Really. I don't expect such a young project to have stuff in major-distro repos yet. I wouldn't even ding them for not having their own specfiles or whatever, but they brought up ease of use and requiring users to build from source is not good for ease of use. It's even worse if you trip over their unnecessary dependency on libcurl being built with special OpenSSL support, which is not the case on RHEL/Fedora platforms. So much for the "tested on all major UNIX platform" claim.

Once you've done who-knows-what to your system by running "make install" you're ready to begin configuring. Oh, what fun. To do this, you run "sxsetup" which will prompt you for several things and spit out some user-incomprehensible things like an administrator key. Then you have to log in to another node to repeat the process, manually copying and pasting an admin key from one window to another. Then you have to repeat the process again to set things up for the special-purpose programs you only need because its not a real mountable file system, only this time they call it a "user key" instead. Between the installation mess and the extra steps and the lack of real documentation, I think we can pretty clearly say...

Ease of Use: LIE

OK, so how about security, code quality, and robustness? With regard to security, they make a big deal of having both on-network and on-disk encryption, the latter using client keys. GlusterFS also has both of those, and much of the code has been vetted by Red Hat's renowned security team. Skylable's has been vetted by approximately nobody. A quick perusal of the code shows that it's all home-grown and littered with rookie mistakes. My favorite was this:

  const char *skysalt = "sky14bl3"; /* salt should be 8 bytes */

Yep, that's a constant salt embedded in the code, apparently used in lieu of a real KDF to generate the user's key from their password. Here's a hint: this process adds no entropy to the original text password, no matter how many times you apply EVP_BytesToKey. I actually pointed this one out to them on HN, as did somebody else, and (without admitting error) they claim they'll do better next time, but it does raise an important question. How likely is it that somebody who made such an inexcusable mess of generating a user key then managed to get every other little detail right in their home-grown storage encryption? The odds are nil. I could play free security consultant here and find the next terrible flaw for them, and the next one and the next one, but I shouldn't need to and neither should anyone else. This is just not serious crypto code, so...

Security: LIE

That also tells us where we're headed for code quality. Obviously I can't just rely on "gut feel" and familiarity because I have years of experience with the GlusterFS code and none with this, so I'll try to look at objective measures. I picked several files at random to look at. I deliberately excluded those marked as third-party, but still found a lot of code copied from other codebases - libtool, the ISAAC hash, SQLite. This is not only a terrible code smell but in many cases might be a license violation as well. The data structures seem to be better documented in the Skylable code than in GlusterFS (using the Doxygen style), but otherwise there seemed to be little evidence of this vaunted code quality - even though new code written by a small tight-knit team generally should have a higher median quality than older code written by a much larger team.

Error checking doesn't seem notably more consistent, and cleanup after an error often seems to have involved copying free/close/etc. functions from one code block to another instead of using any of several more robust idioms. I specifically looked at error checking around read(2) and write(2) to see if it handled partial success as well as outright failure. Generally, no. The code uses lt__malloc for no apparent reason, but doesn't get any extra memory safety for the extra effort. Logging/tracing doesn't seem particularly strong. Skylable's own code (as opposed to that they copied) seems to use sprintf more than snprintf. I know these are all incredibly superficial observations, but code quality is an enormously complex topic. These are just the things that are easy to put into words, and they're already more than @tkojm has offered in support of his claim. They're enough to say...

Better Code Quality: QUESTIONABLE

It's pretty much the same for robustness. There's no evidence from the real world about the robustness of this code, of course. I also see even fewer tests than there are for GlusterFS, so I'd have to say that claim's QUESTIONABLE as well. Of the four claims @tkojm made, therefore, two are questionable and two out outright false, so the whole evaluates to false. Let's move on to the part he didn't even want to talk about: performance.

To test performance, I used two SSD-equipped 16GB instances at Digital Ocean. It took me an hour or so to work through all of the dependency crap and get things set up before I was able to run any tests. Then the very first test I ran was a very simple 4KB file-create loop using sxcp. What was the ressult?

0.67 files per second

I'm not joking, but apparently they are. GlusterFS is often criticized for its performance on exactly this kind of workload, and I'd be the first to say rightly so, but it's still orders of magnitude better than that. Those are modem speeds, on machines that can and do perform quite well using any other software. There's just no excuse. Why even bother looking further?

I could keep going. I could go into detail about how the CLI lacks built-in help, how it doesn't seem to include anything to report on node or cluster status, how there don't seem to be any provisions for basics such as rebalancing after a server is added or permanently removing one from the config after the hardware blew up. I could talk about how storing sensitive data unencrypted in a plethora of separate SQLite files is bad for security, performance, and maintainability all at once. But really . . . enough. More than enough. Not even the most fervent advocate of "Minimum Viable Product" could consider this to be past the prototype phase.

Let's try to give the Skylable folks as much benefit of the doubt as we can here. Maybe a few people decided that they'd had enough of some other technical area, and settled on distributed object storage as their next challenge. So they started tinkering around with Skylable SX as a platform for learning and experimentation. I think that's awesome. I want to encourage that kind of thing. Sure, I might have suggested a little more reading and studying how existing systems do the same sorts of things, before diving into code (especially that awful home-grown crypto), but I'd still try to be supportive. The problems only start when someone decides to start chasing money instead of technology. After all, this is a hot area, but new enough that many users don't know how to tell the serious players from the charlatans, so why not try to cash in? So he starts mouthing off about how this is already a serious contender, even though the people actually writing the code know it's still years from that. That's no longer OK. That's encouraging people to use code that will not store their data safely or securely, and I have a zero tolerance policy for that sort of thing.

My message here is really pretty simple: keep coding, keep experimenting, if I have offended anyone's technical sensibilities I sincerely apologize, but for heaven's sake somebody get @tkojm to STFU until it's done. Fix the crypto, fix the performance, fix the packaging and UI. Then we'll have something real to talk about. Maybe, if I find more time than has already been wasted, I'll even dig in and submit a patch or two, fix some of that egregiously bad performance. But not if I keep hearing how it's already better than anything else.

Comments for this blog entry