Justin Sheehy at Basho has posted a good article about vector clocks, which has led to an interesting discussion. I highly recommend that you read what’s said there, but my own key takeaway is much the same as it was in my own fairly recent article about conflict resolution: when the problem is conflicts in a distributed data system, there is no magic bullet. Vector clocks won’t save you. Client-side vector clocks won’t save you. Conditional writes won’t save you. This is not an issue that can be handled entirely within the storage system, or within a thin isolated layer of your application (which is effectively part of the storage system). Like security or performance, data conflicts are an issue that will pervade any distributed application, and anyone who says otherwise is a charlatan. Application designers should understand the tools available, including the limitations of those tools, and take responsibility for using them to build the application that best satisfies their requirements. It’s a top-to-bottom effort. Get used to it.

A related point I think is worth making is that I’m not saying it’s OK to lose data. As a storage professional for many years, I would never say such a thing, but I will say that people need to be clear about the difference between losing data and allowing data to be overwritten (or deleted) in response to user actions. I will also say is that storage systems should be clear about the promises they make, and rigorously keep those promises. I’m perfectly comfortable saying that “last write wins” is a valid rule, with a long tradition of usefulness in filesystems and databases, and that it can continue to govern writes to a single node in a distributed system. Vector clocks still have a lot of value between nodes, where “last write” or any other concept of relative time can be impossible to pin down, but even Justin’s article mentions some limitations. Ultimately, the application (or the user) still has to deal with conflicts. To enable that, it can sometimes be more important for the storage system to offer easily understood behavior than to offer complex behavior that’s “better” only according to certain assumptions about what the user would want. “Pilot error” isn’t likely to be a very welcome response to a user who misunderstood complex rules and then lost data as a result. The rules must be defined with an eye toward minimizing the probability of data loss or corruption throughout the system, not just within one component.