In a valiant attempt to wean myself off of doomscrolling, I thought I’d try to write a few blog posts this summer. This one highlights some of the exciting things that happened over the past few months in Everest, and specifically around the HACL and EverCrypt projects.
New bindings for Everest cryptography
The OCaml bindings were authored by Victor Dumitrescu. Thanks to Victor’s patches, KreMLin, in addition to generating C code from F*, now also generates matching ocaml-ctypes bindings for the generated C code. The resulting OCaml package is called hacl-star-raw. The API in hacl-star-raw is very low-level and does not enforce any of the preconditions that are present in the F* source code. To make things much more pleasant to use, Victor also authored a separate package that wraps hacl-star-raw with a set of very idiomatic bindings that use functors, nice high-level signatures and types. This latter package is called simply
hacl-starand can be installed with
opam install hacl-star.
One of the very nice things about calling HACL or EverCrypt from OCaml is that all of the static preconditions can be checked at run-time! The functions that HACL* or EverCrypt expose to clients have preconditions of the form
disjoint x y, where
yare arrays, or preconditions of the form
length x = l. Because we bind arrays as
bytesin OCaml, the former check boils down to
==and the latter check boils down to
Bytes.length. So, your OCaml program will not segfault if you misuse the API – it is completely safe.
Of course, there’s quite a gap between writing a paper and finishing the packaging, integrating the work under CI, fixing the long tail of small bugs that prevent a smooth integration, etc. So, it wasn’t until a few months ago that we were able to finally declare victory and publish WHACL*.
npm install hacl-wasm.
The documentation for both of these packages is online.
Much improved packaging and distribution
Over the past few months, we’ve steadily made great progress ensuring our code can be consumed easily by anyone interested. This required work in many areas:
The generated C code is now under version control on the master branch and refreshed automatically every night (contribution by Franziskus Kiefer). This means that you can always get the latest and freshest code!
There is now a configure script that performs some amount of auto-detection for your toolchain. This was important to enable compilation on Arm, where some files just flat out don’t compile, because, say, they use 256-bit wide vector instructions, for which there exists no ARM implementation.
We’ve signed up for Travis CI to try compiling the generated C code. We are now testing six different configurations: Linux/ARM64, Linux/AMD64 (4 variants), Windows/AMD64. This was motivated by the wonderful and humbling cornocupia of compiler and toolchain bugs we found. Among the most delightful issues we had:
- a version of XCode refused to compile our inline assembly because its register allocator bailed; no one was able to reproduce it, not even Apple, so it looks like Travis is using the one exact build of XCode with the problem, and this build cannot be found anywhere else!
- GCC swapped the order of arguments of an intrinsic at some point throughout
its lifetime, but clang, which also defines
__GNUC__, always had the right order to begin with
- after fixing the order of arguments, GCC still miscompiled the intrinsic
- no single toolchain agrees on a uniform way to securely zero-out memory
- compiler intrinsics are not found in the same header in MSVC vs. the rest of the world
- and so many more…
As always, this is the most unrewarding kind of work: anyone would rather be doing cool proofs than battle with Travis and debug toolchain issues. But, it’s my firm belief that high-quality packaging is essential to ensure the success of Everest crypto. So, extra thanks to Victor Dumitrescu, Natalia Kulatova, Marina Polubelova and Santiago Zanella-Béguelin for their help debugging and nailing down some of these most vexing issues.
New incremental APIs
Of course, many things happened on the code side, with new algorithms, new vectorized APIs, and an increased usage of meta-programming that has significantly lowered our code-to-proof ratio. All of this is covered in our ePrint, submitted with amazing collaborators Marina Polubelova, Karthikeyan Bhargavan, Benjamin Beurdouche, Aymeric Fromherz, Natalia Kulatova and Santiago Zanella-Béguelin.
But rather than enumerate a long list of improvements, I’ll just focus on one piece of work that I’m particularly excited about.
Many cryptographic primitives fall within a given family of algorithms that
share some common characteristics. For instance, Merkle-Damgard hashes, Poly1305
and Blake2 are all block algorithms. This means that they process exactly one
block of data at a time; clients who can’t provide data block-by-block must
perform buffering themselves, a classic source of bugs owing to
modulo-computations. Furthermore, block algorithms require clients to follow a
very precise state machine, with a sequence of functions oftentimes referred to
finish – it is easy for clients to get this wrong.
Finally, there are very subtle but essential differences in the state machines;
for instance, clients may never call Blake2’s
update_last with an empty block,
even though it’s ok to do so for SHA2.
finish is a very error-prone, low-level API
and clients are probably better off using higher-level APIs that take care of
the buffering, state machine management, and abstract away the idiosyncrasies
of each algorithm.
Naturally, this is where formal verification comes in. We decided to start looking higher up the stack, beyond bare cryptographic primitives. In our latest work, we tackle the verification of the high-level APIs that perform internal buffer management and rule out state machine errors by copying internal block state as needed.
We have over a dozen implementations in our tree that are eligible for a high-level API. Writing a copy of the high-level API for each would be bad engineering, a poor use of our time, and of course, not very much fun.
Instead, we wrote a functor that takes as an argument a block-based algorithm, and generates via meta-programming a Low* implementation of its corresponding high-level API. The high-level API has a trivial state machine that admits no user errors; clients can feed the data byte-by-byte thanks to internal buffering; and by virtue of being the application of a single functor, all high-level APIs are meant to be used in the exact same fashion, where unpleasant state machine differences have all been abstracted away.
Thanks to a judicious use of meta-programming and very fine-grained control of meta-time partial evaluation, the resulting code has no cruft. The functor has several tweaking knobs, controlling for instance whether the resulting Low* code will have to perform key management (e.g. Poly1305), or whether this is not needed (e.g. SHA2). In the latter case, the relevant struct fields and function parameters get eliminated via partial evaluation.
These new high-level APIs are now all available on master, and are known as the “Streaming” APIs. Thanks to Aymeric Fromherz and heroic work by Son Ho, Blake2 has been brought under the same API, meaning we can now trivially generate a byte-by-byte API for any implementation of SHA, Poly1305 or Blake2.
We plan to add SHA3 to the set of available streaming APIs, and also adopt the same approach for other families of algorithms, such as CTR encryption. And, of course, write a paper about this all. Stay tuned!