In my copious spare time (a little joke; I worked 75 hours last week) I’ve been turning over some ideas in my head for a system that combines aspects of backup/restore, file synchronization, and basic version control. In some ways it’s similar to Unison (which might be a good starting point for the codebase) with a few significant enhancements:

  • Instead of both directories being treated as peers, one is treated as a “repository” with a complete history of its previous states and the other would be treated as a working directory.
  • In addition to file synchronization (basically a union or merge of two directories) there would be functions to force the repository to become like the working directory or vice versa, or to make one repository like another.
  • Various other things people expect in a backup system, such as checking/clearing archive flags, compression, etc.

The key to this is a content-addressable file store (using hashes that need to be collision resistant but not cryptographically strong). On top of that, a repository or snapshot thereof becomes mostly a map from file names to their content hashes and can thus be quite space-efficient.

There’s a very distinct possibility that I’ve just reinvented something that already exists. If so, then I’d appreciate a pointer to whatever it is, either as a starting point for my own efforts or as a complete solution that I can use right away. I know that many of my readers have ideas related to this, and would appreciate their suggestions as well.