What’s new in Everest: Summer 2020

In a valiant attempt to wean myself off of doomscrolling, I thought I’d try to write a few blog posts this summer. This one highlights some of the exciting things that happened over the past few months in Everest, and specifically around the HACL and EverCrypt projects.

New bindings for Everest cryptography

The big piece of news is that we now have official OCaml and JavaScript bindings for our cryptographic code, a long-standing request from many consumers.

The OCaml bindings were authored by Victor Dumitrescu. Thanks to Victor’s patches, KreMLin, in addition to generating C code from F*, now also generates matching ocaml-ctypes bindings for the generated C code. The resulting OCaml package is called hacl-star-raw. The API in hacl-star-raw is very low-level and does not enforce any of the preconditions that are present in the F* source code. To make things much more pleasant to use, Victor also authored a separate package that wraps hacl-star-raw with a set of very idiomatic bindings that use functors, nice high-level signatures and types. This latter package is called simply hacl-star and can be installed with opam install hacl-star.

One of the very nice things about calling HACL or EverCrypt from OCaml is that all of the static preconditions can be checked at run-time! The functions that HACL* or EverCrypt expose to clients have preconditions of the form disjoint x y, where x and y are arrays, or preconditions of the form length x = l. Because we bind arrays as bytes in OCaml, the former check boils down to == and the latter check boils down to Bytes.length. So, your OCaml program will not segfault if you misuse the API – it is completely safe.
The JavaScript bindings build on our work on WebAssembly, and were authored by Denis Merigoux. Last year, we published the first fully formalized compilation toolchain from F* to WebAssembly at IEEE S&P’19. In that work, we presented (among other things) a paper formalization of the Low* to WASM compilation, along with its implementation in KreMLin, and its application to HACL*. This gave us WHACL*, the compilation of HACL* to WASM via the formalized toolchain in the KreMLin compiler.

Of course, there’s quite a gap between writing a paper and finishing the packaging, integrating the work under CI, fixing the long tail of small bugs that prevent a smooth integration, etc. So, it wasn’t until a few months ago that we were able to finally declare victory and publish WHACL*.

Just like for OCaml, the JavaScript bindings are made up of two layers. The lower layer is “raw” WASM code as output by the KreMLin compiler. Using this code requires knowing not only the static preconditions that must be satisfied by the clients, but also being aware of the KreMLin calling-convention, i.e. how KreMLin-compiled WASM code expects JavaScript clients to lay out arguments in memory before jumping into the WASM code.

To make things easier for clients, Denis wrote a JavaScript wrapper that hides all of these complexities and offers an idiomatic, “native” JavaScript API based on ArrayBuffers. This has been published in the node package repository and can be installed with npm install hacl-wasm.

One of the highlights of our JavaScript package is that it now makes it easy to use verified cryptography on the web, on the desktop (e.g. Electron) or on the server (via node). You no longer need to wait for WebCrypto to include the newer algorithms!

The documentation for both of these packages is online.

Much improved packaging and distribution

Over the past few months, we’ve steadily made great progress ensuring our code can be consumed easily by anyone interested. This required work in many areas:

The generated C code is now under version control on the master branch and refreshed automatically every night (contribution by Franziskus Kiefer). This means that you can always get the latest and freshest code!
There is now a configure script that performs some amount of auto-detection for your toolchain. This was important to enable compilation on Arm, where some files just flat out don’t compile, because, say, they use 256-bit wide vector instructions, for which there exists no ARM implementation.
We’ve signed up for Travis CI to try compiling the generated C code. We are now testing six different configurations: Linux/ARM64, Linux/AMD64 (4 variants), Windows/AMD64. This was motivated by the wonderful and humbling cornocupia of compiler and toolchain bugs we found. Among the most delightful issues we had:
- a version of XCode refused to compile our inline assembly because its register allocator bailed; no one was able to reproduce it, not even Apple, so it looks like Travis is using the one exact build of XCode with the problem, and this build cannot be found anywhere else!
- GCC swapped the order of arguments of an intrinsic at some point throughout its lifetime, but clang, which also defines __GNUC__, always had the right order to begin with
- after fixing the order of arguments, GCC still miscompiled the intrinsic
- no single toolchain agrees on a uniform way to securely zero-out memory
- compiler intrinsics are not found in the same header in MSVC vs. the rest of the world
- and so many more…

As always, this is the most unrewarding kind of work: anyone would rather be doing cool proofs than battle with Travis and debug toolchain issues. But, it’s my firm belief that high-quality packaging is essential to ensure the success of Everest crypto. So, extra thanks to Victor Dumitrescu, Natalia Kulatova, Marina Polubelova and Santiago Zanella-Béguelin for their help debugging and nailing down some of these most vexing issues.

New incremental APIs

Of course, many things happened on the code side, with new algorithms, new vectorized APIs, and an increased usage of meta-programming that has significantly lowered our code-to-proof ratio. All of this is covered in our ePrint, submitted with amazing collaborators Marina Polubelova, Karthikeyan Bhargavan, Benjamin Beurdouche, Aymeric Fromherz, Natalia Kulatova and Santiago Zanella-Béguelin.

But rather than enumerate a long list of improvements, I’ll just focus on one piece of work that I’m particularly excited about.

Many cryptographic primitives fall within a given family of algorithms that share some common characteristics. For instance, Merkle-Damgard hashes, Poly1305 and Blake2 are all block algorithms. This means that they process exactly one block of data at a time; clients who can’t provide data block-by-block must perform buffering themselves, a classic source of bugs owing to modulo-computations. Furthermore, block algorithms require clients to follow a very precise state machine, with a sequence of functions oftentimes referred to as init/update_block/update_last/finish – it is easy for clients to get this wrong. Finally, there are very subtle but essential differences in the state machines; for instance, clients may never call Blake2’s update_last with an empty block, even though it’s ok to do so for SHA2.

In short, init/update_block/update_last/finish is a very error-prone, low-level API and clients are probably better off using higher-level APIs that take care of the buffering, state machine management, and abstract away the idiosyncrasies of each algorithm.

Naturally, this is where formal verification comes in. We decided to start looking higher up the stack, beyond bare cryptographic primitives. In our latest work, we tackle the verification of the high-level APIs that perform internal buffer management and rule out state machine errors by copying internal block state as needed.

We have over a dozen implementations in our tree that are eligible for a high-level API. Writing a copy of the high-level API for each would be bad engineering, a poor use of our time, and of course, not very much fun.

Instead, we wrote a functor that takes as an argument a block-based algorithm, and generates via meta-programming a Low* implementation of its corresponding high-level API. The high-level API has a trivial state machine that admits no user errors; clients can feed the data byte-by-byte thanks to internal buffering; and by virtue of being the application of a single functor, all high-level APIs are meant to be used in the exact same fashion, where unpleasant state machine differences have all been abstracted away.

Thanks to a judicious use of meta-programming and very fine-grained control of meta-time partial evaluation, the resulting code has no cruft. The functor has several tweaking knobs, controlling for instance whether the resulting Low* code will have to perform key management (e.g. Poly1305), or whether this is not needed (e.g. SHA2). In the latter case, the relevant struct fields and function parameters get eliminated via partial evaluation.

These new high-level APIs are now all available on master, and are known as the “Streaming” APIs. Thanks to Aymeric Fromherz and heroic work by Son Ho, Blake2 has been brought under the same API, meaning we can now trivially generate a byte-by-byte API for any implementation of SHA, Poly1305 or Blake2.

We plan to add SHA3 to the set of available streaming APIs, and also adopt the same approach for other families of algorithms, such as CTR encryption. And, of course, write a paper about this all. Stay tuned!