Every month or two, someone comes along and claims to be the new Best Thing
Ever in distributed file storage. More often than not, it's just another
programmer who recently discovered things like consistent hashing and
replication, then slapped together another HTTP object store because that's
what people nowadays do instead of writing their own LISP/Forth interpreter or
"make" replacement. There's nothing wrong with the exercise itself, of course.
It's a great learning experience, and it's how real projects get started. For
example, LeoFS might not really be "the leading DFS" as they claim, but
it certainly a serious effort that I'm watching with interest. What gets my
goat is always the grandiose claims, often made in the form of comparisons
between real production-level file systems like GlusterFS and things that are
neither production-level nor file systems.
This month's example is Skylable, which tried to take advantage of the
publicity around yesterday's big announcement to pimp their own spare-time
project. At first they just tried to position themselves as a competitor to
GlusterFS and Ceph when they're clearly not. I tried, as neutrally as I could,
to point out that it's not a valid comparison. They didn't take the hint.
Instead, @tkojm decided to double down.
Skylable SX beats both Ceph & Gluster in terms of security, code quality,
ease of use and robustness. Cheers.
OK, game on. Such claims really piss me off, not because they're made against
my own project but because they're disrespectful to every project in the same
space. For example, Tahoe-LAFS plays in exactly this space, and they
actually know what they're doing when it comes to security. Making competitive
claims that are not only unaccompanied by one shred of evidence but clearly
false to anyone with even the most cursory knowledge of the competitive
landscape is outright dishonest. The Skylable folks have practically invited
more serious comparisons, so I'm going to give them what they asked for and
they're not going to like it. Maybe that will keep the next tyro from making
the same mistake.
Before I go on, I should mention that this post has nothing to do with Red Hat
or the Gluster community. No time or equipment from either was used to test
the Skylable code or write up the results. This is not big bad Red Hat picking
on a smaller competitor. This is one guy (me), on his own time, trying to find
the truth behind some very ambitious claims.
Let's start with ease of use. Here are the steps to install GlusterFS, set up
a two-way replicated volume, and mount it on a client.
yum install glusterfs... (or equivalent for other distros)
gluster peer probe server2 (from server1)
gluster volume create myvol replica 2 server1:/path server2:/path
gluster volume start myvol
mount -t glusterfs server1:/myvol /wherever (from client)
What's the equivalent for Skylable? Well, you start by downloading,
configuring, and building from source. Really. I don't expect such a young
project to have stuff in major-distro repos yet. I wouldn't even ding them for
not having their own specfiles or whatever, but they brought up ease of use and
requiring users to build from source is not good for ease of use. It's even
worse if you trip over their unnecessary dependency on libcurl being built with
special OpenSSL support, which is not the case on RHEL/Fedora platforms. So
much for the "tested on all major UNIX platform" claim.
Once you've done who-knows-what to your system by running "make install" you're
ready to begin configuring. Oh, what fun. To do this, you run "sxsetup" which
will prompt you for several things and spit out some user-incomprehensible
things like an administrator key. Then you have to log in to another node
to repeat the process, manually copying and pasting an admin key from one
window to another. Then you have to repeat the process again to set things up
for the special-purpose programs you only need because its not a real
mountable file system, only this time they call it a "user key" instead.
Between the installation mess and the extra steps and the lack of real
documentation, I think we can pretty clearly say...
Ease of Use: LIE
OK, so how about security, code quality, and robustness? With regard to
security, they make a big deal of having both on-network and on-disk
encryption, the latter using client keys. GlusterFS also has both of those,
and much of the code has been vetted by Red Hat's renowned security team.
Skylable's has been vetted by approximately nobody. A quick perusal of the
code shows that it's all home-grown and littered with rookie mistakes. My
favorite was this:
const char *skysalt = "sky14bl3"; /* salt should be 8 bytes */
Yep, that's a constant salt embedded in the code, apparently used in lieu of a
real KDF to generate the user's key from their password. Here's a hint: this
process adds no entropy to the original text password, no matter how many
times you apply EVP_BytesToKey. I actually pointed this one out to them on HN,
as did somebody else, and (without admitting error) they claim they'll do
better next time, but it does raise an important question. How likely is it
that somebody who made such an inexcusable mess of generating a user key then
managed to get every other little detail right in their home-grown storage
encryption? The odds are nil. I could play free security consultant here and
find the next terrible flaw for them, and the next one and the next one, but I
shouldn't need to and neither should anyone else. This is just not serious
crypto code, so...
That also tells us where we're headed for code quality. Obviously I can't just
rely on "gut feel" and familiarity because I have years of experience with the
GlusterFS code and none with this, so I'll try to look at objective measures. I
picked several files at random to look at. I deliberately excluded those marked
as third-party, but still found a lot of code copied from other codebases -
libtool, the ISAAC hash, SQLite. This is not only a terrible code smell but in
many cases might be a license violation as well. The data structures seem to be
better documented in the Skylable code than in GlusterFS (using the Doxygen
style), but otherwise there seemed to be little evidence of this vaunted code
quality - even though new code written by a small tight-knit team generally
should have a higher median quality than older code written by a much larger
Error checking doesn't seem notably more consistent, and cleanup after an error
often seems to have involved copying free/close/etc. functions from one code
block to another instead of using any of several more robust idioms. I
specifically looked at error checking around read(2) and write(2) to see if it
handled partial success as well as outright failure. Generally, no. The code
uses lt__malloc for no apparent reason, but doesn't get any extra memory safety
for the extra effort. Logging/tracing doesn't seem particularly strong.
Skylable's own code (as opposed to that they copied) seems to use sprintf more
than snprintf. I know these are all incredibly superficial observations, but
code quality is an enormously complex topic. These are just the things that are
easy to put into words, and they're already more than @tkojm has offered in
support of his claim. They're enough to say...
Better Code Quality: QUESTIONABLE
It's pretty much the same for robustness. There's no evidence from the real
world about the robustness of this code, of course. I also see even fewer
tests than there are for GlusterFS, so I'd have to say that claim's
QUESTIONABLE as well. Of the four claims @tkojm made, therefore, two are
questionable and two out outright false, so the whole evaluates to false.
Let's move on to the part he didn't even want to talk about: performance.
To test performance, I used two SSD-equipped 16GB instances at Digital Ocean.
It took me an hour or so to work through all of the dependency crap and get
things set up before I was able to run any tests. Then the very first test I
ran was a very simple 4KB file-create loop using sxcp. What was the ressult?
I'm not joking, but apparently they are. GlusterFS is often criticized for its
performance on exactly this kind of workload, and I'd be the first to say
rightly so, but it's still orders of magnitude better than that. Those are
modem speeds, on machines that can and do perform quite well using any other
software. There's just no excuse. Why even bother looking further?
I could keep going. I could go into detail about how the CLI lacks built-in
help, how it doesn't seem to include anything to report on node or cluster
status, how there don't seem to be any provisions for basics such as
rebalancing after a server is added or permanently removing one from the config
after the hardware blew up. I could talk about how storing sensitive data
unencrypted in a plethora of separate SQLite files is bad for security,
performance, and maintainability all at once. But really . . . enough.
More than enough. Not even the most fervent advocate of "Minimum Viable
Product" could consider this to be past the prototype phase.
Let's try to give the Skylable folks as much benefit of the doubt as we can
here. Maybe a few people decided that they'd had enough of some other
technical area, and settled on distributed object storage as their next
challenge. So they started tinkering around with Skylable SX as a platform for
learning and experimentation. I think that's awesome. I want to encourage
that kind of thing. Sure, I might have suggested a little more reading and
studying how existing systems do the same sorts of things, before diving into
code (especially that awful home-grown crypto), but I'd still try to be
supportive. The problems only start when someone decides to start chasing
money instead of technology. After all, this is a hot area, but new enough
that many users don't know how to tell the serious players from the charlatans,
so why not try to cash in? So he starts mouthing off about how this is
already a serious contender, even though the people actually writing the
code know it's still years from that. That's no longer OK. That's encouraging
people to use code that will not store their data safely or securely, and I
have a zero tolerance policy for that sort of thing.
My message here is really pretty simple: keep coding, keep experimenting, if I
have offended anyone's technical sensibilities I sincerely apologize, but
for heaven's sake somebody get @tkojm to STFU until it's done. Fix the crypto,
fix the performance, fix the packaging and UI. Then we'll have something real
to talk about. Maybe, if I find more time than has already been wasted, I'll
even dig in and submit a patch or two, fix some of that egregiously bad
performance. But not if I keep hearing how it's already better than anything