Secure Email

Ever since one of the talks at LISA, I've been thinking about secure email. My thoughts are nowhere near complete, but I need to get them out of my head and I do that by writing about them. Apologies in advance.

I've actually been thinking for many years about how email should be overhauled. For at least twenty years the idea that the same message contents get stored over and over again for multiple users, even on the same system, has bugged me. Sure, nowadays we have deduplication, but that's a hack. At the time an email is sent, we know for almost zero cost and with absolute certainty that the body is the same for every recipient. Why rely on expensive and approximate deduplication to make up for the fact that we were too stupid to take advantage of that information within the email system itself? For those same twenty-plus years, I've been thinking about how to implement email by separating storage and notification. The message contents get stored once in a data store that's accessible to the sender and recipients, then pointers to those contents are sent separately. In fact, I would be surprised if large email services such as those run by Google or Yahoo don't work that way for messages sent among their own subscribers.

Unfortunately, this approach is incompatible with the current email protocols such as IMAP and SMTP. They don't separate storage and notification that way. Sure, you can do it all in the servers, but then you have the same problem as with most cloud-storage services that do something similar: if the server has your ciphertext and your keys, they might as well have your cleartext. They can talk all they like about how carefully they manage those keys, but it's all bullshit. Some of us were talking about this years ago, and built systems like HekaFS to address it, but were largely ignored. If there's one good thing that has come out of the recent NSA/Google revelations, it's that people finally realize keys have to stay on the client side. Thank you, Edward Snowden.

The way around this is to use a local proxy on the user's machine. On one side it speaks IMAP and/or SMTP. On the other, it speaks the protocols necessary to interact with our secure data store and notification system. This requires only a very tiny bit of extra configuration by the user to point their email program at the proxy instead of a regular server, but then it opens up a whole new world of possibilities that don't exist when trying to preserve legacy protocols throughout the system. Let's look at how this would work in the context of email between users of the same provider.

  1. The sender's email client talks to their proxy, using local SMTP, to send a message.

  2. The sender's proxy generates a new symmetric encryption key and initialization vector (IV) and encrypts the message - including both the contents and the "envelope" metadata. It also generates an HMAC to protect against both corruption and tampering.

  3. The encrypted message, IV, and HMAC are stored in the provider's message store, yielding an ID. The message store can be pretty plain, or it can have all sorts of features to improve security. For example, if traffic analysis to match senders with receivers is a concern (and it should be) then the provider can implement techniques known from Freenet/Tahoe-LAFS to foil such attempts.

  4. Anybody who has the ID from the previous step can now retrieve the message, but it's still encrypted using a unique key. This key is not stored anywhere (except maybe on the sender's machine, but ideally not even then). What we do instead is construct a separate notifier for each recipient, encrypting the message ID and key using that particular recipient's public key.

  5. At this point, the recipient could be notified synchronously, connecting to them via SSL or similar. This provides the best forward secrecy, but also requires that the recipient be online to receive the notification. More often, the notifiers will need to be stored somewhere for later retrieval. In this case, we could use a second kind of distributed data store, much like the message store and with the same potential for additional code to foil traffic analysis etc. Each user is represented by an existing file or object, and sending a message is just a matter of appending a new notifier.

  6. Some time later, a recipient's email client talks to their proxy, this time using local IMAP, to check for messages.

  7. The recipient's proxy fetches their file/object from the notification store, and possibly truncates it back down to zero.

  8. For each notifier received, the proxy extracts the message ID and key, then uses them to fetch the corresponding message from the message store.

  9. Messages are decrypted and translated into IMAP responses to the recipient's email client, as needed.

This scheme seems as secure as anything I've heard described elsewhere, and neither hard to implement nor hard to use. The biggest problem with it that I can think of is garbage collection. To do that properly, objects in the message store would need to have reference counts, with an authenticated decrement protocol or some such. To start with, I'd probably just avoid that by saying that message have expiration dates. The provider's guarantee of security matches their guarantee of persistence. If you don't fetch your messages before they expire, too bad. If you want to keep copies longer, then you have to fetch and store them separately, assuming responsibility for securing the copy (or perhaps that's a separate service offered by the same provider).

That's all great within a single provider. How well does it extend out to many providers like we have in the real world? Not that well, unfortunately, but I think that's OK. Just having truly secure email within one provider would be useful. It doesn't seem all that hard to come up with new protocols between providers, allowing them properly controlled access to each other's message and notification stores. Thus, providers that use such protocols could create a whole secure-email ecosystem. Perhaps this is what Lavabit and Silent Circle are already doing within the Dark Mail Alliance, but they're being awfully quiet about the details. The key is that secure email practically has to be a separate ecosystem from the email we already have. A lot of the user-facing parts can still be used without too much trouble, but the entire transmission and storage infrastructure will have to change. While I'm sure people can poke all sorts of holes in what I've outlined above, perhaps something in it will provoke some productive thought. The time for keeping ideas in this area to ourselves is over.

Comments for this blog entry