Comments for HekaFS Formerly known as CloudFS Tue, 26 Mar 2013 22:47:54 +0000 hourly 1 Comment on GlusterFS vs. Ceph by Mattias Eliasson Mattias Eliasson Tue, 26 Mar 2013 22:47:54 +0000 It should be noted that besides Ceph scalability it also have self-healability and self-management as a design goal. If you want performance you could add OpenAFS on top of it for aggressive local caching.

Its design goal makes Ceph a lot easier to deploy than GlusterFS and for that reason alone I prefer Ceph.

Of course its not entirely plug and play but its far closer to that than any other cloud storage I have tried. That includes OpenAFS.

Personally I plan to use Ceph even as a local file-system in order to use spread data among disks where I see it as far superior compared to ZFS and other local enterprise file-systems.

On another issue… There are more than one distributed filesystem out there and when I look at GlusterFS and Ceph I see a lot of redundant code in both projects. I do not know how many CRC32 implementations I have seen in my life for example (i don’t remember if GlusterFS have one but its still a good example). Of course the real problem are that a lot of processors these days have CRC32 in hardware. In Solaris there are system API:s for such algorithms, with a system wide implementation. And on UltraSPARC that implementation are in hardware. That’s one issue I have with code redundancy. The others are that it also means bug redundancy, if there are an error in many CRC32 implementations, how do I know which applications are affected. If all of them use a common shared library I would know for certain.

This is something that both GlusterFS and Ceph developer should consider… share as much code as possible and do it in some easy to use packaging.. like… Of course in order to have system-wide libraries that replace all per-application implementation this may need to be utilizing another licence than GPL.

There are also a lot of public domain code in the SQLite project that overlaps cluster file-system projects. For example CRC32.

Comment on GlusterFS vs. Ceph by John Wallace John Wallace Tue, 12 Mar 2013 14:12:37 +0000 Thanks for doing this work!! We are looking for a new distributed file system to possibly replace our 20 year old OpenAFS cell and Ceph and GlusterFS are both on our list.

Comment on GlusterFS vs. Ceph by Sunghost Sunghost Thu, 07 Mar 2013 15:51:45 +0000 thx for that. i am looking actualy for a good system for videostreaming and tried moosefs which performed under 50MBit now i am looking to another better system, whould you use glusterfs for that?

Comment on GlusterFS, cscope, and vim (oh my!) by Jules Wang Jules Wang Wed, 06 Mar 2013 08:03:38 +0000 By changing the hot-key, I prefer ctrl-j c/e/..

And you may find ctrl-] , ctrl-i , ctrl-o helpfull. :-)

Comment on Gdb Macros For GlusterFS by Tom Tromey Tom Tromey Tue, 26 Feb 2013 21:01:10 +0000 You’re probably better off writing pretty-printers in Python. See

There’s more documentation about how printers are selected and how they are loaded that is also worth reading.

There are a few benefits to using pretty-printers. They work directly with “print” — no separate commands needed. This means they also work in stack traces. They’re also integrated into MI, so they work properly with (modern) GUIs. Also, you’re writing in Python, so you have more facilities than the overly simplistic gdb CLI language.

Comment on Two Kinds of Open Source by Jeff Darcy Jeff Darcy Sun, 10 Feb 2013 17:16:30 +0000 That’s a tough one, nicu. One common solution is to do both, in different contexts. The best known example is probably Fedora vs. RHEL. The upstream/community version can be very bleeding-edge at a possible cost in stability or polish, while the downstream/corporate version can turn that on its head. (In fact Fedora does better than a lot of projects in terms of completeness, so perhaps it’s a bad example.) My hope is that upstream GlusterFS (from and downstream Red Hat Storage can apply this same kind of model, though today I think they’re still too intertwined. It’s kind of extra hard to maintain that separation when practically all of the developers are from a single company.

In other cases, the answer might just be to pick one model and stick with it. Since it’s hard to get companies to devote resources to “maybe it’ll work someday” projects, this will generally tend toward the more heavyweight do-everything model. Basically, once a company is basing their own products on something they should expect to devote some resources to rounding out the package instead of expecting a ready-to-sell package to come from upstream.

Comment on Two Kinds of Open Source by nicu nicu Sun, 10 Feb 2013 11:38:49 +0000 If the company is selling the stuff for real money, they should also ship the software on schedule (while making it sure is *good*), customers depend on those schedules.

Anyway, how about the third case, found very often, when the software is created by a community comprised of both individuals on their spare time and companies?

Comment on GlusterFS vs. Ceph by Jeff Darcy Jeff Darcy Sat, 26 Jan 2013 04:23:31 +0000 I was able to run some tests on a couple of machines at work: 8x 2.4GHz Westmere CPUs, 24GB memory, 7x SAS disks each, running RHS 2.0 (approximately RHEL 6.2) with GlusterFS 3.3.1 and Ceph 0.56 (Bobtail). The numbers for disk were uninteresting in both cases, so I switched to using ramdisks. Then I had to install Fedora 18 in a VM on the client (servers are still bare metal) because the Ceph kernel client won’t even build on a kernel that old and I didn’t really feel like blowing away the OS on one of my machines for a competitor’s sake. It’s not ideal by any stretch, but at least it doesn’t have any random variation from other users.

In that configuration, Ceph actually managed to win significantly for medium thread counts (10-20 threads) but had much greater variability and *huge* CPU usage (over 400% sometimes) in the OSD daemons. This makes for a much more interesting performance/scalability picture with a less clear winner, but I’m too tired after fighting with SELinux all evening to turn the results into pretty graphs right now.

Comment on GlusterFS vs. Ceph by Jeff Darcy Jeff Darcy Fri, 25 Jan 2013 12:32:46 +0000 Did you read the first two paragraphs, Jamie? I did try to run on physical servers first, but then I ran into the extreme RHEL-unfriendliness of the Ceph build and it was easier to spin up a few cloud servers than to reinstall the OS on the physical ones for their sake. That approach also has the advantage that others can reproduce my results.

I’m well aware of the variability in cloud block storage performance, but it is possible to measure and correct for that. When the block-storage performance is stable over a period of time[1] and the filesystem consistently captures less than a quarter of that performance, that’s not the block storage’s fault. Either the filesystem is overly sensitive to general *network* slowness (e.g. by incurring too many round trips per operation and/or by failing to parallelize/pipeline operations) or it has purely internal problems.

[1] Not hard to find on Rackspace or Storm on Demand’s SSD servers. Cleverkite’s servers are consistent, but consistently *bad*. Digital Ocean’s are good for a few brief moments, then get throttled into oblivion. Amazon’s are just all over the map all of the time, which is why I don’t do tests there unless I’m specifically testing a new Amazon instance type. Remember, cloud != Amazon.

Comment on GlusterFS vs. Ceph by Jamie Begin Jamie Begin Thu, 24 Jan 2013 19:06:01 +0000 Can you clarify what you mean by “For these tests I used a pair of 8GB cloud servers”?

Wouldn’t using hardware that you have physical access to be more useful when benchmarking a filesystem? Cloud-based block storage performance is often wildly erratic and there are many underlying variables that can’t be controlled for.