I went to my first meetup last night, at Aprigo in Waltham. The group is about cloud services, and on this particular occasion there were two presentations about EMC’s Atmos – one from EMC themselves, and one from partner TwinStrata. Things were pretty quiet at the beginning, as most people were consuming the Intuit-provided pizza and beer. I always think that’s a bit of funny, since it seems to me that the ideal time to eat is when somebody else is talking at the front of the room, and the time when nobody is talking at the front of the room should be prime schmoozing time. Maybe I’m just being too analytical about the whole thing. Anyway, I did get a chance to say hi to Scot Junkin, who I know from SiCortex and is now at EMC, and meet a couple of new people as well.

After a while our host Tsahy Shapsa got things going with a round of quick introductions. There were about sixty people representing the expected mix of entrepreneurs, consultants, and job-seekers. I’m sure some people described themselves as one while actually being another, but that’s OK. That’s what meetups are for, right? Next we started on the EMC presentation about Atmos.

There really wasn’t a whole lot that I didn’t already know from it being public information available on the EMC website, but I did jot down a few of the less obvious comments.

  • Filesystem access is provided through a dedicated gateway, which uses FUSE, so there are two network hops plus FUSE overhead.
  • Despite the fact that some of the principals worked on OceanStore, where erasure coding was a key technology, the data protection strategy seems to be entirely based on simple replication.
  • Placement policies are reasonably flexible, but I never got a chance to ask whether they were guided by the correlated-failure work the OceanStore folks had done. Actually I did have the chance, but didn’t think those present would actually know the answer and it wouldn’t have been a good use of the group’s time.
  • Pricing is for the use of a capability, not for its mere presence – as it should be for anything cloud-oriented.
  • They’ll be rolling out a compute service “by the end of the year” to support in-data-center access to data. It looks like it’ll be roughly comparable to EC2 plus S3/EBS; there was no mention of supporting other features like SDB/SQS, and of course EMC pricing is likely to keep people on Amazon.

Perhaps the most contentious moment came when someone asked about consistency. If concurrent read/write are attempted across sites, might the reader get stale data or have to wait for the writer? A couple of people in the audience asserted that it must be strong consistency via locking, which I found slightly amazing. Anybody in the cloud space should know about Brewer’s CAP theorem, anybody who works on distributed systems should know about eventual consistency, and anybody who works on computers at all in this day and age should know about optimistic concurrency. To be fair, the person asking the question seemed to know that multiple answers were possible and just wanted to know which ones Atmos had arrived at, but others who were trying to show off their knowledge did a pretty good job of displaying their ignorance instead.

The TwinStrata presentation was shorter, so I have less to say about it. They offer compression but not de-dup, though when they were asked they did point out that their replication uses delta compression and thus provides much the same advantage there as de-dup. That’s actually a pretty good answer, though some people would probably still like to see full de-dup on data both in flight and at rest. A couple of other quick points:

  • TwinStrata licensing is by usable capacity, though (of course) Atmos charges are for raw capacity used.
  • I was curious about what the mapping was from TwinStrata entities – volumes, blocks, or something in between – to Atmos objects. A lot of people woke up and got ready to take notes, but the TwinStrata folks weren’t able to answer.

I asked about availability, and it seems that they don’t really have their own HA strategy. If a virtual appliance fails and you’re using asynchronous replication, it sounds like you’d better make sure your local cache is on shared storage and you have to handle failover to another appliance using that shared storage yourself. Since they only support asynchronous replication, you don’t have the option of running their stuff synchronously and Atmos asynchronously. Alas, they didn’t seem to understand that I was talking about a local private Atmos instance connected via a fast interconnect, and kept talking about WAN latencies. Since they offer block (iSCSI) service, there’s also another level of caching at the server/host that needs to be considered, but I didn’t get around to bringing that up. They did mention that they don’t have anyone using their product for tier-one storage, and aren’t pushing it as such. I think these are the sorts of issues they’ll need to think about some more if they want to change that.

The two talks had run kind of long, and our hosts had to clean up, so there wasn’t too much schmooze time at the end. I did chat with a couple of folks and exchange some contact info, though. All in all, it was a good and productive time. I hope I’ll be able to tap into this resource some more to refine my definition of “cloud filesystem” over the next while.