Archive for January, 2011

CloudFS at FUDCon

I’ve written a post on my own blog about being at FUDCon in Tempe, including a presentation on CloudFS. You can read the whole thing there, but I’ll copy the CloudFS-specific part here. As a special bonus, here are the slides.

Somebody pointed out that I’d seemed a bit on edge the night before. Partly that was just the stress of travel and of being an introvert mingling with an unfamiliar group of people, but there’s another factor that I hadn’t even consciously realized until I was writing this post. I’ve presented about CloudFS privately and/or in fairly abstract terms so many times that I’d actually forgotten this was the first truly public presentation about a concrete thing that I’ll actually be delivering in the near future. That’s a big deal. I was a bit concerned at first because they’d put me in the largest room and at five past the hour it was still three-quarters empty. Nobody likes talking to an empty room. Shortly after I started, though, the room was pretty much full – not standing-room-only full, but I don’t remember seeing many empty seats. Not that I was trying too hard to count, of course; I was otherwise occupied. Even better, people were engaged. There were many questions, and they were good questions – questions that to me indicated genuine curiosity and constructive intent, not just the “I’m going to prove I’m smart” or “if you don’t get this one right your project will look silly” kinds of questions that one often gets. The post-presentation chatter even went on so long that Chris had to kick us away from the lectern. Good problem to have. :)

The best part of all, in my opinion, was outside of the talk itself. In at least two other presentations, and in even more hallway conversations, the possibility of using CloudFS to solve some problem or add some functionality came up. Also, at least one person had clearly given the code a pretty detailed look since my talk, asking questions and making comments about internal details that he could not have known about otherwise. That is so cool. It’s all very well to have people’s attention for an hour or so before people move on to the next new thing, but when something you’ve talked about shows up in colleagues’ own thinking about how to solve their own problems that’s an even surer measure of being on the right track. Thank you, everyone, for letting me be part of the broader progress we’re all making together.

 

Filesystem as a Service

The abstract for Curino et al‘s Relational Cloud: A Database-as-a-Service for the Cloud might seem a bit familiar.

Early DBaaS efforts include Amazon RDS and
Microsoft SQL Azure, which are promising in terms of establishing the market need for such a service, but which do not address
three important challenges: efficient multi-tenancy, elastic scalability, and database privacy.

Multi-tenancy and privacy are, of course, also goals for CloudFS. This shouldn’t be too much of a surprise, since databases and filesystems often have to contend with the same issues, and it’s the relatively new “as a service” requirement that current offerings in both areas don’t yet address. Some of their conclusions are even the same as for CloudFS:

Efficient multi-tenancy.

One approach to this problem would be to use virtual machines
(VMs); a typical design would pack each individual DB instance into
a VM and multiple VMs on a single physical machine.

Instead, our approach uses a single database server on
each machine, which hosts multiple logical databases.

Privacy. A significant barrier to deploying databases in the cloud
is the perceived lack of privacy, which in turn reduces the degree
of trust users are willing to place in the system. If clients were
to encrypt all the data stored in the DBaaS, then the privacy concerns
would largely be eliminated. The question then is, how can the
DBaaS execute queries over the encrypted data? In Relational Cloud,
we have developed CryptDB, a set of techniques designed to provide
privacy (e.g., to prevent administrators from seeing a user’s data)
with an acceptable impact on performance (only a 22.5% reduction
in throughput on TPC-C in our preliminary experiments). Database
administrators can continue to manage and tune the databases, and
users are guaranteed data privacy.

It’s not clear how they’re really able to preserve these properties for string-based queries; their examples all use numeric operations and figure 2 even seems to show string values stored with no encryption at all. Fortunately, CloudFS doesn’t have to deal with query semantics. It does have to deal with path lookups, though, and I’m still working out ways to provide the same security/privacy for path components (file names) as already exists for file contents.

Quibbles aside, it’s still good to see people approaching some of the same problems as CloudFS, and reaching some of the same conclusions. “XYZ as a service” means more than just taking some software that does XYZ in some completely different environment and shoving it into the cloud. Providing something as a service, whether it’s a filesystem or a database or an ice cream machine, requires serious thought about the difference between a service and a thing. Traditional software is a thing. You engage in a one-time transaction and then you have a whole thing all to yourself, independently of whether anyone else has another thing just like it. A service is continual and shared, not only between provider and user but between users as well. All of those relationships – benign, malign, or anywhere in between – have to be considered in the way you build and deploy and offer access to the service. Otherwise you’re providing XYZ as a disservice.