Collecting my thoughts about Torus

The other day, CoreOS announced a new distributed storage system called Torus. Not too surprisingly, a lot of people have asked for my opinion about it, so I might as well collect some of my thoughts here.

First off, let me say that I like the CoreOS team and I welcome new projects in this space - especially when they're open source. When I wrote the first C bindings for etcd, it gave me occasion to interact a bit with Brandon Phillips. He seems like an awesome fellow, and as far as I can tell others on that team are good as well. I think it's great that they're turning their attention to storage. I don't want them to go away, or fail. I want to see them succeed, and teach us all something new.

If I seem negative it's not toward or because of the developers. Like many engineers, I have a strong distaste for excessive marketing, and that's what I find objectionable about the announcement. The claims are not only far beyond anything that has actually been achieved, which is fine for a new project, but also far in excess of anything that experience tells us is likely to be achieved within any relevant period of time. Willingness to tackle unknown problems is great, but these are for the most part not unknown problems. The difficulties are quite well known, and represent hard distributed-system problems. If you want to claim that solutions are imminent, it really helps to demonstrate a thorough understanding of those problems. Instead, we're presented with claims that are vague or misleading, claims that illustrate significant gaps in knowledge, and at least one claim that's blatantly false. Quoting from the announcement:

These distributed storage systems were mostly designed for a regime of small clusters of large machines, rather than the GIFEE approach that focuses on large clusters of inexpensive, “small” machines.

It's not true for Gluster. It's not true for Ceph. It's not true for Lustre, OrangeFS, and so on. It's not even true for Sheepdog, which Torus very strongly resembles. None of these systems were designed for small clusters. It's true that some of them might have more trouble than they should scaling up to hundreds of machines, but those are implementation issues and the work that remains to be done is still less than building a whole new system from scratch.

The same paragraph then continues by talking about the specific problems with high-margin proprietary systems, implying that they're the most relevant alternative. They're not. Already, I've seen many people comparing Torus to open-source solutions, and nobody comparing them to proprietary ones. The omission of other open-source projects from their portrayal stands out as deliberate avoidance of hard questions. So does the lack of any explanation of what makes Torus any better than anything else for containers. Being written in Go doesn't make something container-specific. Neither does using etcd. There's nothing in the announcement about any actual container-oriented features, like multi-tenancy or built-in support for efficient overlays. It's just a vanilla block store using basic algorithms, marketed as good for containers. There's nothing wrong with that, in fact it's quite useful, but it's hardly ground-breaking. Anyone who attended my FAST tutorials on this subject during the three years I gave them could have built something similar in the same six months.

The other part of the announcement that bothers me is this.

Torus includes support for consistent hashing, replication, garbage collection, and pool rebalancing through the internal peer-to-peer API. The design includes the ability to support both encryption and efficient Reed-Solomon error correction in the near future, providing greater assurance of data validity and confidentiality throughout the system.

"Includes support" via an API? Does that mean it's already there, or planned, or just hypothetically possible? The first two seem to be there already. I wouldn't be so sure about any sufficiently transparent and non-disruptive form of rebalancing. Encryption and Reed-Solomon are supposedly in the "near future" but I doubt that future is really so near. The implication is that these will be easy to add, but I think the people who have worked on these for Gluster or Ceph or HDFS or Swift would all disagree. Similarly, there's this from Hacker News:

early versions had POSIX access, though it was terribly messy. We know the architecture can support it, it's just a matter of learning from the mistakes and building something worth supporting.

"Just a matter" eh? It was "just a matter" for CephFS to be implemented on top of RADOS too, but it took multiple genius-level people multiple years to get that where it is today. Saying this is "just" anything sets an unrealistic expectation. I'd expect anyone who actually understands the problem domain to warn people that getting from block-storage simplicity to filesystem complexity is a big step. Such a transition might take a while, or not happen at all. Then there's this.

Some good benchmarks to run:

Linear write speed

dd if=/dev/zero of=/mnt/torus/testfile bs=1K count=4000000

Traditional benchmark

bonnie++ -d /mnt/torus/test -s 8G -u core

Single-threaded sequential 1KB writes for a total of 4GB, without even oflag=sync? Bonnie++? Sorry, but these are not "good benchmarks to run" at all. They're garbage. People who know storage would never suggest these. We're all sick of complaints that these are slow, or slower on distributed systems than on local disks, as though that's avoidable somehow. Anybody who would suggest these is not a storage professional, and should not be making any claims about how long it might take to implement filesystem semantics on top of what Torus already has.

So, again, this is not about the project itself but the messaging around it. For the project itself and the engineers working on it: welcome. Best of luck to you. Feel free to ping me if you want to brainstorm or compare notes. BTW, I'll be in San Francisco at the end of this month. For the marketing folks: get real. You're setting your own engineers up for failure, disappointment, and recriminations. I know you want to paint the best picture you can, but that's no excuse for presenting fiction as fact. That's a perfectly good horse you have there. Maybe that horse will even be good enough to win a race or two some day. Stop trying to tell people it's a unicorn just because you have some ideas about how to graft a horn onto its head.

Comments for this blog entry