Yes, you read that right – Zettar, not Zetta. Robin Harris mentioned them, so I had a quick look. At first I was going to reply as a comment on Robin’s site, but the reply got long and this site has been idle for a while so I figured I’d post it here instead. I also think there are some serious issues with how Zettar are positioning themselves, and I feel a bit more free expressing my thoughts on those issues here. So, here are my reactions.
First, if I were at Zetta, I’d be livid about a company in a very similar space using a very similar name like this. That’s discourteous at best, even if you don’t think it’s unethical/illegal. It’s annoying when the older company has to spend time in every interaction dispelling any confusion between the two, and that’s if they even get the chance before someone writes them off entirely because they looked at the wrong website. The fact that Robin felt it necessary to add a disambiguating note in his post indicates that it’s a real problem.
Second, Amazon etc. might not sell you their S3 software, but there are other implementations – Project Hail’s tabled, Eucalyptus’s Walrus, ParkPlace. Then there’s OpenStack Storage (Swift) which is not quite the same API but similar enough that anyone who can use one can pretty much use the other. People can and do run all of these in their private clouds just fine.
Third, there are several packages that provide a filesystem interface on top of an S3-like store – s3fs, s3fuse, JungleDisk, even my own VoldFS (despite the name). Are any of these production-ready? Perhaps not, but several are older than Zettar and several are open source. The difference in deployment risk for Zettar vs. alternatives is extremely small.
Lastly, Zettar’s benchmarks are a joke. The comparison between local Zettar and remote S3 are obviously useless, but even the comparison with S3 from within EC2 is deceptive. Let’s just look at some of the things they did wrong:
- They compare domU-to-domU for themselves vs. EC2 instances which are likely to be physically separate.
- They fail to disclose what physical hardware their own tests were run on. It’s all very well to say that you have a domU with an xxx GHz VCPU and yyy GB of memory, but physical hardware matters as well.
- Disk would have mattered if their tests weren’t using laughably small file and total sizes (less than memory).
- 32-bit? EC2 m1.small? Come on.
- They describe one set of results, failing to account for the fact that EC2 performance is well known to vary over time. How many runs did they really do before they picked these results to disclose?
- To measure efficiency, they ran a separate set of tests using KVM on a notebook. WTF? Of course their monitoring showed negligible load, because their test presents negligible load.
There’s nothing useful about these results at all. Even Phoronix can do better, and that’s the very lowest rung of the storage-benchmarking ladder.
This looks like a distinctly non-enterprise product measured in distinctly non-enterprise ways. They even refer to cloud storage 2.0, I guess to set themselves apart from other cloud-storage players. They’ve set themselves apart, all right. Beneath.
Disclaimer: I work directly with the Project Hail developers, less so with the OpenStack Storage folks. My own CloudFS (work) and VoldFS (personal) projects could also be considered to be in the same space. This is all my own personal opinion, nothing to do with Red Hat, etc.
Jeff,
Thank you for taking the time to make comments. Nevertheless, I think it’s time to discuss the points that you made. I would like to provide my views about each below:
Regarding space:
Zetta is in the cloud storage service segment. Zettar Inc. is in strictly software business. Besides, we think it’s very misleading to say we copied the name just because Zetta is well-funded, has more market visibility, comparing to us, a bootstrapped effort. Note also Zetta is a common number (10 to the 21st power).
We started using our name Zettar actively since 2005, also for a storage related venture. We didn’t notice the entry Zetta until late 2008, when it got funded – insufficient justification IMHO for us to change our name, which we already invested in various way. No one wants such a confusion. All I can say is that it’s a coincidence. I don’t believe anyone can accuse us for being unethical.
Regarding other AWS S3 implementation:
In fact, similar points have been raised by another commenter in @robin’s blog. Nevertheless, as I pointed out in my follow-up to these comments, the sandbox is just a prelude of what are coming from us. We have actually duplicated more than just AWS S3.
Our focus has always been to resolve the three big issues that CIOs face/will be facing should they move to object storage: (1) lack of standards (2) vendor lock-in and (3) availability issues. If a “solution” just offer S3 implementation (such as ParkPlace, Eucalyptus’ Warlus), how does a CIO deal with the above three issues?
Regarding filesystems that frontend AWS S3:
Zettar’s virtual file system, as pointed out by @robin, fronts ends multiple object storage clouds concurrently, not just AWS S3. This is a critical difference. The design and implementation again is motivated again by our desire to resolve the three big CIO concerns mentioned above. If a file system only front ends AWS S3, then IMHO it’s insufficient.
EC2/S3 vs Zettar local sandbox:
You call our benchmark a “joke” is really puzzling. Our benchmark page made is very clear what our test hosts were: hardware, OS, number of CPUs, clock speed, RAM, network connection speed and more.
Furthermore, we made it very clear that identical tests were performed both for the EC2/S3 combo and our own setup. As much as possible, we have tried to make the comparison valid. I would be glad to correspond with you off-line and clarify more for you. As to using a notebook for one of our test, that’s exactly the point we wish to make: having a local sandbox gives developers the freedom to develop, test, and stage cloud applications even in the comfort and convenience of your own notebook – Internet connection is optional. Here again I fail to see the rationale of your criticisms.
I would recommend you and your readers to re-read @robin’s blog critically once more.
Regards,
Chin Fang, Founder, Zettar Inc.
“as I pointed out in my follow-up to these comments, the sandbox is just a prelude of what are coming from us. We have actually duplicated more than just AWS S3.”
That’s good to hear, and if you’ve been working on this since 2005 it should hardly surprise anyone that there’s more to come, but I can only evaluate what I can see.
“Our benchmark page made is very clear what our test hosts were: hardware, OS, number of CPUs, clock speed, RAM, network connection speed and more.”
I just double-checked http://www.zettar.com/zettar/products/zcloud/benchmarks and I still don’t see that. You describe the *guests* (virtual machines) in reasonable detail, but the only information you give about the physical server used for most of that tests is that it’s in a hosting data center with 10Mb/s internet connection. The specs you give for the notebook used in the efficiency tests are more complete but irrelevant. Even if you did describe the physical hardware used for the bulk of the tests, comparing to an EC2 m1.small is neither valid nor useful. A more useful test would involve a larger EC2 instance type, virtual client to physical server on your side (to match what you’d be doing with Amazon’s own S3 from EC2), much larger data sets, repeated to eliminate variance due to contention with other EC2 users. You could also compare against Hail on the exact same hardware, or – since you claim to support multiple object storage clouds and the API differences are in the noise – against OpenStack.
For all I know, when you release the rest of the code and run real benchmarks, I’ll be blown away by both the functionality and the performance. I’d be the first to congratulate you. All I’m saying is that the information available right now about the program that’s available right now doesn’t even indicate parity with existing open-source projects. Only you can change that, and I hope you do so pretty soon, because I know from both $dayjob and my personal projects that the space you’re looking to enter will be pretty crowded by this time next year.
Update 2011-04-21: I noticed a couple of hits to this article, so I went and took another look. There’s exactly one new blog post about virtual appliances for VMWare etc., but there’s no mention of this being anything but a shinier package around the same (minimal) functionality as before. Still unimpressed.