Amazon has rolled out S3, or Simple Storage Service. Here’s the blurb.

Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

Price is $0.15 GB-month of space, and $0.20 per GB of transfer. They actually describe requirements and even design principles on the referenced page, which is kind of cool. Requirements are that the service be – and I quote – scalable, reliable, fast, inexpensive, and simple. OK, yeah, those all sound like good things. The design principles they supposedly use to achieve these goals are as follows.

  • Decentralization
  • Asynchrony
  • Autonomy
  • Local responsibility
  • Controlled concurrency
  • Failure tolerant
  • Controlled parallelism
  • Decompose into small well-understood building blocks
  • Symmetry
  • Simplicity

Again, that’s a lot of mom and apple pie. One thing glaringly absent from the requirements is data consistency, and in fact some of the statements associated with design principles seem to indicate that it’s not a feature. For example, under “local responsibility” they make the statement that “Each individual component is responsible for achieving its consistency; this is never the burden of its peers.” You might think this is a minor quibble, that because of their excellent (but undescribed) modularity they can add consistency for those who need it, but whether or not a data store provides a consistent view of data is a fundamental property of that data store. Distributed data stores are all about who has (or can have) which data when, and consistency requirements change that calculus at the most basic level. Like performance or security, or perhaps even more so, it’s not something you can just tack on as an afterthought because whether or not you seek to provide consistency drives many other design choices all the way down to network protocols and internal metadata structures.

Without consistency you might as well use the Internet Backplane Protocol, which was designed to serve almost exactly the same need as S3 in an open-standards way. I’m not saying IBP is the greatest thing ever, but I’m not sure Amazon’s proprietary version is either.