Long gone are the days of my youth where AOL Messenger and IRC ruled online communication spaces… in this day and age, people use WhatsApp, Signal, Facebook Messenger, or even Instagram Messages (or so I am told). This new generation of messenger apps provides some fundamental technological improvements, such as Unicode Emoji instead of ASCII smileys, or perhaps more interestingly to readers of this blog, End-to-End Encryption.

While I could write several blog posts about the many fascinating facets of Unicode, including segmentation, encodings, normalization, and the creative use of zero-width joiner codepoints to create new emojis… I would like to focus today on MLS, a new standard in the space of secure messaging protocols. Specifically, this blog post focuses on verifying MLS, in order to establish with high assurance that it does, indeed, provide End-to-End Encryption in the context of group messaging. This blog post is an informal version of our paper, which my student Théophile (co-advised with Karthik B at INRIA) will present at Usenix Security this summer.

Our paper sheds lights on what it means to perform secure group management; identifies the sub-component of the standard in charge of this and formalizes it. This yields a reference implementation (for interop testing, or just for other implementors’ inspiration), and also yields bugs or shortcomings in the standard (which are now all fixed). Read on for more details.

A word about MLS

Théophile has written an excellent introduction to MLS. To recap briefly: two-party end-to-end secure messaging has been widely studied, notably through the Signal protocol, which powers many encrypted messengers (notably, those mentioned above, but not AOL Messenger or IRC, obviously). But oftentimes, users have multiple devices, and sharing keys between devices is bad™️. Ergo, every conversation is actually a group conversation between multiple devices, and for that we need something beyond Signal. Several solutions exist, but they’re suboptimal (see Théophile’s blog post), which is why “the industry” has been hard at work at the IETF devising a new standard called MLS that aims to provide end-to-end encryption, for groups.

If that all sounds familiar, it is: other protocols like TLS have been going through the same standardization process at the IETF. But unlike TLS, the academic community was involved from the get-go in the design and security analysis of MLS. Which begs the question: what properties are we trying to establish here?

First, some basics: MLS only takes care of processing messages and maintaining group data structures and corresponding cryptographic material. Crucially, MLS assumes the existence of two important components, along with some behavioral properties they must enjoy. First, a delivery service, which is responsible for carrying messages from one device to all the other participants in the group, retrying if need be, handling eventual consistency, and so on. Second, an identity service (sometimes called a directory) which associates some initial cryptographic material to a given identity, and somehow guarantees that the directory entry for Alice is indeed the correct one.

Now on to the expected properties of MLS. The functional requirements of MLS are perhaps unsurprising: it should be able to deal with group members going offline for periods of time then “catching up”; it should be scalable and avoid some quadratic issues currently found in some of the existing solutions; and it should support long-lived groups where people come and go over time.

The security requirements are the interesting part. We expect the usual guarantees: authenticity (message from Alice is indeed from Alice), confidentiality (Eve who isn’t in the group can’t decipher the messages), forward secrecy (decrypting at some point in time doesn’t compromise previous conversations) and post-compromise recovery (even if the key of Bob is compromised, after a period of healing, the cryptographic material has been refreshed and the attacker can’t eavesdrop any further).

Roster agreement, and its security

All of these guarantees make sense as long as we can trust the membership of the group. This is known as roster agreement, and states that everyone in the group is on the same page as to who exactly belongs to the group. This is a particularly tricky property to establish, and attacks have been found by others, on this exactly. This brings us to our paper. In it, we tackle questions such as: what exactly is secure group management, how does one capture those security properties, and how does one go about proving them.

The first contribution of the paper is to extricate and disentangle the core group management from what is now a very large standard. We call it TreeSync, and present it as its own, generic protocol that operates independently of the others parts of MLS, which are concerned with epoch secrets and deriving message encryption keys from those secrets. This in itself is novel: the current standard does not identify TreeSync, and it can be difficult to understand what is happening in what is now a very large the document. Our work makes it easier to understand the standard from a security and functional standpoint. Thus, TreeSync lives on, with its mandate being to make sure that everyone in the group has signed off (authenticated) the contents of the roster.

I called TreeSync a protocol: it operates on its own data structure (unsurprisingly, a tree), and can process messages to enact addition, deletion, or key refresh of the members currently in the group, meaning the tree evolves over time. Identifying TreeSync as a protocol on its own allows us to i) better understand past attacks as specifically targeting TreeSync, i.e. the group management part of MLS, and ii) to find new attacks that were previously not “visible” due to TreeSync being mixed up with the rest of MLS.

We formalize TreeSync in F*, sprinkle a generous amount of dependent types to encode a variety of invariants, then use the existing DY* framework to reason about protocol security in the symbolic model. As always, using formal reasoning forces us to think about the details: we ended up looking very closely at a core part of TreeSync called “ParentHash”, which computes a digest of the membership tree that TreeSync manages. Informally, the invariant of the membership tree is that each member (leaf) signs (authenticates) the subtree that is still in the same state as the leaf (hasn’t been updated since the leaf was itself changed). This is a crucial piece of information, since it allows new members to verify the signatures to ensure that membership is authenticated (signed) by group members, not by an external attacker.

There were several problems, related to two core optimizations. The first optimization is called “filtered nodes”, and essentially stems from the fact that the membership tree is a complete binary tree, but that in general, some of those nodes are empty. The second optimization is called “unmerged leaves”, and allows de-coupling addition from authentication. Essentially, unmerged leaves are waiting to be authenticated by the next refresh of secrets in the tree (see the paper for a more precise explanation). Overall, our work not only clarified the expected TreeSync invariant, using formal language, but also found several situations in which the invariant could be broken. These in turn defeated the security guarantees of MLS, and in particular could lead to a state of confusion where not all group members agree on whether Alice is in the group or not. All the fixes to the standard stemming from this work have been integrated in the RFC.

There are plenty of other interesting details in the paper, including a signature confusion attack, where the absence of disambiguating label could allow a signature to be reused in another part of the protocol (this is also very bad™️). But I’ll just conclude this brief overview with a note about implementation.

Actually running the code

In this line of research, and especially in the sort of protocols-meet-the-real-world projects that Karthik and I have been pursuing over the past few years, having actual code has always been a priority. This paper makes no exception, and we took great care to ensure that our reference implementation can actually be executed and is not just a bunch of lemmas. This is not just for fun: without serious interop testing, you might be proving great properties about a protocol that turns out to be MLS’, but not the actual MLS.

For this work, with the addition of byte-correct parsers and serializers (more on that in a later blog post), we were able to extract our entire MLS specification. Unlike previous work in Low* that extracts to C, this time, we extracted using F*’s regular extraction pipeline to OCaml. (Side-note: we would like to have a verified implementation of MLS in Rust using Aeneas, more on that also later.) As we had seen previously in other works (Signal*, Noise*…), the performance of the protocol is mostly dominated by its crypto primitives. What we did here is compile MLS to JavaScript using js_of_ocaml, then “plug” the protocol part on top of the efficient WebAssembly crypto primitives from our 2019 work on compiling HACL* to WASM. The result is a very decent implementation of MLS, which we actually integrated into a prototype version of Skype. We were able to successfully converse in secure group settings, and showed that one can be both specification-oriented and have very decent performance.

There is much more to learn about MLS: Théophile is planning an in-depth dive on his blog, meaning I can stop my overview here, and redirect the curious reader to our paper, or the blog. Hit us up if you want a copy of the implementation!