<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="http://jonathan.protzenko.fr/feed.xml" rel="self" type="application/atom+xml" /><link href="http://jonathan.protzenko.fr/" rel="alternate" type="text/html" /><updated>2026-05-05T16:55:55-07:00</updated><id>http://jonathan.protzenko.fr/feed.xml</id><title type="html">Jonathan Protzenko</title><subtitle>Google Information Security Engineering (Seattle)</subtitle><entry><title type="html">A case study with Aeneas and jxl-rs</title><link href="http://jonathan.protzenko.fr/2026/05/05/jxl-rs.html" rel="alternate" type="text/html" title="A case study with Aeneas and jxl-rs" /><published>2026-05-05T08:00:00-07:00</published><updated>2026-05-05T08:00:00-07:00</updated><id>http://jonathan.protzenko.fr/2026/05/05/jxl-rs</id><content type="html" xml:base="http://jonathan.protzenko.fr/2026/05/05/jxl-rs.html"><![CDATA[<p>Aeneas is a toolchain for verifying Rust programs, relying on a functional
translation to Lean. Aeneas is now nearly five years old, and many excellent
improvements have landed recently, thanks to the work of <a href="https://www.sonho.fr/">Son Ho</a>,
<a href="https://aymericfromherz.github.io/">Aymeric Fromherz</a>, Guillaume Boisseau and
many other collaborators. With this blog post, I’m hoping to showcase how proofs
in Aeneas look like these days, and highlight recent improvements, by relying on
a sample problem given to me by my colleague <a href="https://www.lucaversari.it/">Luca
Versari</a>. I think this blog post can also serve as a nice
example / tutorial on how to work with Aeneas on a real-world piece of code.</p>

<p>Before going any further: more info about Aeneas is available on
<a href="https://github.com/AeneasVerif/aeneas">GitHub</a>, and most communication happens
on our <a href="https://aeneas-verif.zulipchat.com/">Zulip</a> – subscribe to the
newsletter channel there if you’d like to stay updated. Finally, the science is
in our <a href="https://arxiv.org/abs/2206.07185">ICFP’22</a> and <a href="https://arxiv.org/abs/2404.02680">ICFP’24</a> papers.</p>

<h1 id="background-jxl-rs">Background: jxl-rs</h1>

<p><code class="language-plaintext highlighter-rouge">jxl-rs</code> is a rewrite of the jpeg-xl codec, in Rust. For performance reasons,
parts of the rewrite <em>do</em> rely on unsafe code. In this blog post, our goal is to
show that <a href="https://github.com/libjxl/jxl-rs/blob/2c2636a14bd4c9be259b42805578724cfff821f3/jxl/src/entropy_coding/ans.rs#L372">this particular
<code class="language-plaintext highlighter-rouge">unchecked_get</code></a>
is safe, that is, that the index is always within bounds.</p>

<p>Aeneas does not have much support for unsafe code; but unchecked accesses
such as the one above are modeled like regular accesses: one must show that the
index is within bounds in order to show that the function terminates with a
proper result (the <code class="language-plaintext highlighter-rouge">.ok</code> case in Aeneas). In other words, if we can prove that
<code class="language-plaintext highlighter-rouge">read</code> produces an <code class="language-plaintext highlighter-rouge">.ok</code>, then we have shown that that the array access is
always within bounds.</p>

<h1 id="running-aeneas">Running Aeneas</h1>

<p>I packaged the proofs <a href="https://github.com/AeneasVerif/jxl-proofs">here</a> as a toy
project. To run Aeneas, we must first produce an LLBC file, that is, a cleaned
up version of the MIR representation of the Rust compiler. This is done
by <a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/Makefile#L8">invoking</a>
the Charon tool, passing it a <code class="language-plaintext highlighter-rouge">--start-from</code> flag to extract <code class="language-plaintext highlighter-rouge">read</code> (our target
function), its
transitive dependencies, and nothing else. See <a href="https://arxiv.org/abs/2410.18042">our
paper</a> for more info on Charon and LLBC.</p>

<p>The next step is to <a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/Makefile#L12">run Aeneas
itself</a>, to
produce a Lean representation of the Rust code out of the LLBC. We pick the
<code class="language-plaintext highlighter-rouge">lean</code> backend and generate multiple files, thus separating functions, types,
and proofs.</p>

<p>Aeneas is tied to a particular version of Lean: we adopt the
<a href="https://github.com/AeneasVerif/aeneas/blob/main/backends/lean/lean-toolchain">lean-toolchain</a>
provided by the Aeneas repository.</p>

<h1 id="missing-functions">Missing functions</h1>

<p>The first thing we notice is that the project does not compile (i.e. no <code class="language-plaintext highlighter-rouge">lake
build</code>), because some Lean definitions are missing. This is because
the corresponding Rust functions are missing from the LLBC. This happens if e.g.
the functions live in a
third-party crate, or if there is no corresponding <code class="language-plaintext highlighter-rouge">--include</code> flag passed
to Charon, which would force extraction to traverse crate boundaries. The
signatures of these functions are emitted by Aeneas as axioms in a Lean (file
<code class="language-plaintext highlighter-rouge">FunsExternal_template.lean</code>), but no body is provided: Aeneas simply does not
know the expected behavior of an external function.</p>

<p>The user can assert facts about an external definition; one
easy way to do this is by providing a function body, in other words, by writing
a trusted model of the external function. Here, we
rename <code class="language-plaintext highlighter-rouge">FunsExternal_template.lean</code> into <code class="language-plaintext highlighter-rouge">FunsExternal.lean</code>, and
<a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/FunsExternal.lean">fill
out</a>
the function definition. We are now asserting that the external
function <code class="language-plaintext highlighter-rouge">read_u64</code> behaves like its model. The Aeneas-specific <code class="language-plaintext highlighter-rouge">scalar_tac</code>
comes in handy to prove that the result is within bounds.</p>

<p>With that, and a <a href="https://github.com/AeneasVerif/jxl-proofs/blob/main/lean/Main.lean#L1">little bit of
boilerplate</a>,
the project builds. We create a <code class="language-plaintext highlighter-rouge">Proofs.lean</code> file and add it to the main
<code class="language-plaintext highlighter-rouge">JxlProofs.lean</code> to make sure it is picked up by the build.</p>

<h1 id="background-on-how-proofs-work-in-aeneas">Background on how proofs work in Aeneas</h1>

<p>Proofs in Aeneas/Lean rely on two key features.</p>

<p>The first feature is the monadic encoding of
Aeneas, which translates a Rust function to a monadic Lean function, i.e. one
that relies on the do-notation. See
<a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Funs.lean">Funs.lean</a>
for examples of this. In short, monadic operations (denoted by a left-arrow ←)
encode the possibility of <em>failure</em>, e.g. if an index is outside of bounds (line
47), or if a left-shift exceeds the bitwidth (line 49).</p>

<p>The second feature is the step tactic and corresponding step lemmas. To reason
about what it takes for such a function to successfully execute (i.e., reach the
<code class="language-plaintext highlighter-rouge">.ok</code> case and never the <code class="language-plaintext highlighter-rouge">.panic</code> case), the <code class="language-plaintext highlighter-rouge">step</code> tactic goes over each
monadic operation (a shift, an index, etc.) and collects the required <em>proof
obligations</em> along the way (that the bitwidth does not exceed the width, or that
the index be within bounds). After running the <code class="language-plaintext highlighter-rouge">step</code> tactic, the proof engineer
is left with a series of subgoals – successfully discharging all of these
subgoals establishes that the function terminates without panics.</p>

<h1 id="invariants">Invariants</h1>

<p>Before we actually look at a full example of stepping and proving panic-freedom,
we start in a bottom-up fashion and present our data structure invariants.</p>

<p>The <code class="language-plaintext highlighter-rouge">Proofs.lean</code> file (presented here in its final state) starts with <a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L11-L27">invariants on our data
structures</a>.
Some of those were present in English (as Rust comments), and some others were found
by asking the author of the code for additional properties once I realized that
my hypotheses were not strong enough to conduct my proofs.</p>

<p>For the sake of this blog post, we are only doing partial verification: in a
real project, we would make sure the invariants are properly established at data
structure-creation time, and properly maintained throughout all the code-paths
that lead to the <code class="language-plaintext highlighter-rouge">read</code> function we are currently studying.</p>

<h1 id="helper-lemmas">Helper lemmas</h1>

<p>The next section contains a <a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L41-L95">bunch of helper
lemmas</a> that I either could not find
in Lean’s standard library, or that are simply missing. We decorate them with:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">simp</code> attributes, so that they are handled by <code class="language-plaintext highlighter-rouge">simp</code> without passing them
explicitly</li>
  <li><code class="language-plaintext highlighter-rouge">scalar_tac</code>, to make them trigger automatically in calls to <code class="language-plaintext highlighter-rouge">scalar_tac</code> with
a suitable pattern</li>
  <li><code class="language-plaintext highlighter-rouge">grind_pattern</code>s, to make them available to <code class="language-plaintext highlighter-rouge">grind</code>, also with a suitable
pattern – note that we add them to the Aeneas grindset (<code class="language-plaintext highlighter-rouge">agrind</code>) so that
they participate in automatically discharging subgoals while calling <code class="language-plaintext highlighter-rouge">step*</code>
– more on that below.</li>
</ul>

<p>Again, those were found throughout the course of writing the proofs and this
section grew incrementally – I am simply presenting the final state of the
<code class="language-plaintext highlighter-rouge">Proofs.lean</code> file. One interesting tidbit is the <code class="language-plaintext highlighter-rouge">case_usize</code> helper, a
particularly annoying lemma that states that a property true both in the 32- and
the 64-bit cases is true for any usize. With the help of Claude (thanks Son!),
we now have a quite concise and effective proof.</p>

<p>The theorems showcase some of the Aeneas tactics: <code class="language-plaintext highlighter-rouge">bvify</code>, <code class="language-plaintext highlighter-rouge">bv_tac</code>,
<code class="language-plaintext highlighter-rouge">scalar_tac</code>, all fine-tuned to work well on the kind of goals produced by
dealing with machine integers.</p>

<p>This is typically the kind of boring, mundane and tedious lemmas that one would
do well to prove with AI. Personally, I find those supremely boring.</p>

<h1 id="proving-the-absence-of-panics">Proving the absence of panics</h1>

<p>Let’s now move on to something slightly more interesting, and write our first
specification.</p>

<p>Again, for the sake of example, we focus on the absence of panics. Our
specification is thus going to be quite simple:
we set out to prove Hoare triples of the form <code class="language-plaintext highlighter-rouge">f ⦃ r =&gt; True ⦄</code>, i.e. function <code class="language-plaintext highlighter-rouge">f</code>
terminates with successful result <code class="language-plaintext highlighter-rouge">r</code>, no further properties stated (<code class="language-plaintext highlighter-rouge">True</code>).</p>

<p>A first fun example is <code class="language-plaintext highlighter-rouge">read_u64_spec</code>, <a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L128">presented
here</a>.
It simply states that, provided that there are at least 8 bytes in
the slice passed as an argument (the hypothesis <code class="language-plaintext highlighter-rouge">h</code> over the argument <code class="language-plaintext highlighter-rouge">bytes</code>),
<code class="language-plaintext highlighter-rouge">read_u64</code> terminates without panics. The proof script itself is quite
simple: we unfold the definition, <code class="language-plaintext highlighter-rouge">step</code> over all of the monadic steps of
<code class="language-plaintext highlighter-rouge">read_u64</code> itself, and use the <code class="language-plaintext highlighter-rouge">&lt;;&gt;</code> combinator to apply the <code class="language-plaintext highlighter-rouge">grind</code> tactic on
all the remaining subgoals. I highly recommended following along in your
favorite editor to see the effects!</p>

<p>One key thing to note: <code class="language-plaintext highlighter-rouge">read_u64_spec</code> is itself decorated with <code class="language-plaintext highlighter-rouge">@[step]</code>. It
means that <em>callers</em> of <code class="language-plaintext highlighter-rouge">read_u64</code>, when they use <code class="language-plaintext highlighter-rouge">step</code> themselves, will be
able to leverage this theorem and generate a subgoal (“there are at least eight
bytes in the slice passed to <code class="language-plaintext highlighter-rouge">read_u64</code>”) to show termination.</p>

<p>A second fun example is the <a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L100"><code class="language-plaintext highlighter-rouge">refill_slow_loop_does_not_panic</code>
theorem</a>,
which proves that the function always terminates, no panics, and no further
hypotheses required.
This proof showcases a recent addition to
Aeneas: reasoning about loops via combinators, rather than recursive functions.
Here, to show loop termination, it suffices to find a decreasing measure, and
because we only focus on the absence of panics, no further loop invariant is
needed.</p>

<p>Proof of termination for other functions is remarkably mundane, and <code class="language-plaintext highlighter-rouge">step*</code>
successfully chomps through the monadics steps of the Aeneas encoding. One thing
to note is that <code class="language-plaintext highlighter-rouge">step*</code> will automatically try to discharge subgoals
automatically: the subgoals you see after <code class="language-plaintext highlighter-rouge">step*</code> are those that could <em>not</em> be
discharged automatically. This is where annotating our earlier lemmas
with the correct attributes (<code class="language-plaintext highlighter-rouge">agrind</code>, etc.) pays off. Another interesting tidbit: the treatment
of global <code class="language-plaintext highlighter-rouge">const</code> declarations in Rust was recently improved in Aeneas, and now
with the proper
<a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L153">attributes</a>,
one no longer needs dedicated <code class="language-plaintext highlighter-rouge">simp [global_simps]</code> calls to inline away
constants: this now happens automatically.</p>

<h1 id="trick-introducing-a-helper-on-the-fly">Trick: introducing a helper on the fly</h1>

<p>Because Aeneas works on Rust code as written by the programmer, it is sometimes
the case that it would be beneficial to introduce a helper, to factor out
commonality in the code that we are looking at.</p>

<p>Here,
<a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L34">bucket_index</a>
is a conceptual helper that our target function (<code class="language-plaintext highlighter-rouge">read</code>) does not use; however,
we wish to reason about it independently, for modularity of proofs. We write a
<a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L189">step
lemma</a>
for it, this time with an interesting post-condition! Then, we introduce a
<a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L203">rewrite
lemma</a>
that allows substituting a series of monadic operations that correspond to
<code class="language-plaintext highlighter-rouge">bucket_index</code> with a call to <code class="language-plaintext highlighter-rouge">bucket_index</code> itself.</p>

<p>This is very helpful to make proofs more modular.</p>

<h1 id="final-proof">Final proof</h1>

<p>The <a href="https://github.com/AeneasVerif/jxl-proofs/blob/a1545ef5556f7a2653070c49fdf56a5ebd1bf6f7/lean/JxlProofs/Proofs.lean#L222">final
proof</a>
is relatively concise. Of note: we duplicate the main invariant so that both its
folded form and its unfolded form are available; and we rely on
<code class="language-plaintext highlighter-rouge">bucket_index_eq</code> to materialize the helper in our goal. Many of the steps rely
on <code class="language-plaintext highlighter-rouge">grind</code>, which is now officially supported by Aeneas.</p>

<p>Most of the subgoals are discharged automatically: the remaining ones rely on
additional facts about bounds that can be proven using <code class="language-plaintext highlighter-rouge">bv_tac</code>.</p>

<p>The final two subgoals share a lot of commonality: the same variables are in
scope and the same facts about those are needed in both cases. To share the
reasoning, I used <code class="language-plaintext highlighter-rouge">all_goals</code>.</p>

<h1 id="a-partial-proof">A partial proof</h1>

<p>There are many more interesting things that could be proven about this example.
Notably, the <code class="language-plaintext highlighter-rouge">read</code> function does <em>not</em> actually preserve the invariant: it only
does so as long as we have not reached the end of the input stream. This would
have to be reflected in its post-condition (omitted here), and we would have to
check that all callers handle the end-of-stream state properly.</p>

<h1 id="conclusion">Conclusion</h1>

<p>I hope this brief example provides a glimpse of how proofs typically work in
Aeneas. With suitable annotations and judicious usage of <code class="language-plaintext highlighter-rouge">grind</code>, I believe
those proofs occupy a sweet spot: they enjoy a high degree of automation,
relying on SMT-like tactics (<code class="language-plaintext highlighter-rouge">grind</code>), or domain-specific tactics (<code class="language-plaintext highlighter-rouge">bv_tac</code>),
while still reaping the benefits of interactivity, namely, being able to see the
goal and debug why a proof is not going through.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Aeneas is a toolchain for verifying Rust programs, relying on a functional translation to Lean. Aeneas is now nearly five years old, and many excellent improvements have landed recently, thanks to the work of Son Ho, Aymeric Fromherz, Guillaume Boisseau and many other collaborators. With this blog post, I’m hoping to showcase how proofs in Aeneas look like these days, and highlight recent improvements, by relying on a sample problem given to me by my colleague Luca Versari. I think this blog post can also serve as a nice example / tutorial on how to work with Aeneas on a real-world piece of code.]]></summary></entry><entry><title type="html">Eurydice: a Rust to C compiler (yes)</title><link href="http://jonathan.protzenko.fr/2025/10/28/eurydice.html" rel="alternate" type="text/html" title="Eurydice: a Rust to C compiler (yes)" /><published>2025-10-28T08:00:00-07:00</published><updated>2025-10-28T08:00:00-07:00</updated><id>http://jonathan.protzenko.fr/2025/10/28/eurydice</id><content type="html" xml:base="http://jonathan.protzenko.fr/2025/10/28/eurydice.html"><![CDATA[<p>Perhaps the greatest surprise of the last two years was, for me, the realization
that people not only care about compiling C to Rust (for obvious reasons, such
as, ahem, memory safety) – they also care about compiling Rust to C! Wait,
what?</p>

<p>I <a href="/2024/01/05/eurydice.html">wrote about this</a> briefly a couple years
ago, but the level of interest for the project, I must say, took me somewhat by
surprise. So let’s talk about compiling Rust to C a little more today.</p>

<h1 id="barriers-to-rust-adoption">Barriers to Rust adoption</h1>

<p>Rust is making big progress in terms of adoption, and represents a great value
proposition, especially for new code. Both my <a href="https://microsoft.com">former
employer</a> and my <a href="https://google.com">new employer</a>, like
pretty much everyone else these days, have big projects that are written in pure
Rust or can have Rust components. Even <a href="https://techcommunity.microsoft.com/blog/windowsdriverdev/towards-rust-in-windows-drivers/4449718">Windows kernel drivers can be written
in
Rust</a>
now. Amazing stuff.</p>

<p>However, if your project is, say, an open-source library that gets compiled on a
wonderfully diverse set of target architectures, OSes, distributions and
toolchains, well, chances are… one of these is not going to support Rust. Think of a
crypto library: there <strong>will</strong> be people out there with an obscure compiler for a weird
embedded target, and they really want to compile your library, because they’ve
been told not to roll out their own crypto. Or perhaps you have a format library
ridden with memory errors and you want to port it to Rust. Or maybe your company
has an in-house analysis that only runs on C code. Regardless of the scenario,
there will always be that one legacy use-case that prevents you from switching
to Rust until it’s 2035, all those LTS versions (looking at you RHEL) are
finally retired, and you yourself are too close to retirement to even care
anymore.</p>

<p>That is, unless you’re willing to use a Rust to C compiler.</p>

<h1 id="why">Why?</h1>

<p>Having a backwards-compat scenario where Rust can be compiled to C serves
several purposes.</p>

<ol>
  <li>It allows for a gradual transition. The codebase can be ported to Rust,
and refactored / cleaned up / rewritten to use all the nice Rust things (data
types, pattern-matching, polymorphism, memory safety), thus making you and
your developers much, much happier. Meanwhile, the C version co-exists so
that you don’t alienate your userbase.</li>
  <li>It only requires maintaining a single version. The Rust code is
authoritative; the C code is derived from it automatically, either on CI, or
at least with a CI job that checks that the two are in sync.</li>
  <li>It allows for a census of problematic scenarios. By making the Rust version
the default (and putting the fallback C behind a <code class="language-plaintext highlighter-rouge">--write-us-an-email</code> flag),
there is finally a way to enumerate those mythical users who cannot switch to
Rust just yet.</li>
</ol>

<p>If that sounds appealing, meet Eurydice.</p>

<h1 id="eurydice">Eurydice</h1>

<p>Eurydice is a compiler from Rust to C that aims to produce <em>readable</em> C code. Of
course, readability is subjective; also, seeing that Rust relies on
whole-program monomorphization, the C code is bound to be more verbose than the
Rust code. But you can judge for yourself: here’s the result of <a href="https://github.com/AeneasVerif/eurydice/blob/9b14f74c05228b8335700efcffb55bf82a991975/out/test-libcrux/libcrux_mlkem_portable.c#L936">compiling
libcrux to
C</a>.</p>

<p>The output of the test suite is under version control, and there are <a href="https://github.com/AeneasVerif/eurydice/tree/9b14f74c05228b8335700efcffb55bf82a991975/out">a lot more
tests</a>
to peruse. See for instance <a href="https://github.com/AeneasVerif/eurydice/blob/9b14f74c05228b8335700efcffb55bf82a991975/out/test-symcrust/symcrust.c">this
bit</a>,
compared to the <a href="https://github.com/AeneasVerif/eurydice/blob/9b14f74c05228b8335700efcffb55bf82a991975/test/symcrust.rs">Rust
original</a>.</p>

<h1 id="the-design-of-eurydice">The design of Eurydice</h1>

<p>Eurydice plugs in directly at the MIR level, using
<a href="https://github.com/AeneasVerif/charon/">Charon</a> to avoid reimplementing the
wheel and paying the price of interacting with the guts of <code class="language-plaintext highlighter-rouge">rustc</code>. <a href="https://arxiv.org/abs/2410.18042">Our
paper</a> on Charon says more about its
architecture.</p>

<p>The advantage of plugging in at the MIR level is that i) we do not have to
interpret syntactic sugar, which means our translation is more faithful to the
Rust semantics, and ii) we have way fewer constructs that need compiling to C. Even then,
it’s no easy feat to translate Rust to C.</p>

<p>There is naturally, the need to perform whole-program monomorphization, over
types and const-generic arguments; the compilation of pattern matches into
tagged unions; recognizing instances of iterators that can be compiled to native
C <code class="language-plaintext highlighter-rouge">for</code>-loops. Then, there are more subtle things, such as compiling array
repeat expressions sensibly – zero-initializers when possible, initializer
lists otherwise, unless it generates too much code, in which case <code class="language-plaintext highlighter-rouge">for</code>-loops are
preferable. And finally, there are all the rules about visibility, <code class="language-plaintext highlighter-rouge">static</code>,
<code class="language-plaintext highlighter-rouge">inline</code>, etc. that are very C-specific and depend on how you want to lay out
your C files.</p>

<p>The translation is complicated by the constraint that the generated code
ought to be readable: for instance, we compile Rust structs to
C structs, including
<a href="https://doc.rust-lang.org/reference/dynamically-sized-types.html">DST</a>s, by
relying on <a href="https://en.cppreference.com/w/c/language/struct.html">flexible array
members</a>.
We also
work hard to avoid using the fully-generic tagged union pattern when possible,
instead eliminating the tag when e.g. the Rust enum only has a single case.
Additionally, we rely on Charon to reconstruct control-flow, rather than compile
the MIR <a href="https://en.wikipedia.org/wiki/Control-flow_graph">CFG</a> to C code ridden
with <code class="language-plaintext highlighter-rouge">goto</code>s; again, this is for code quality.</p>

<p>At a low-level, there were many interesting tidbits.</p>
<ul>
  <li>Because arrays in Rust are values, we wrap them within C structs to give them
value semantics in C, too; concretely, <code class="language-plaintext highlighter-rouge">[u32; 8]</code> becomes <code class="language-plaintext highlighter-rouge">struct {
uint32_t data[8]; }</code>. (A previous version of Eurydice would emit <code class="language-plaintext highlighter-rouge">uint32_t *</code>,
and rely on various <code class="language-plaintext highlighter-rouge">memcpy</code>s to implement value semantics, but this produced
a translation that was not type-generic, and there were plenty of finicky
corner cases. We revamped the compilation scheme recently.)</li>
  <li>The notion of <code class="language-plaintext highlighter-rouge">lvalue</code> in C means we need to insert more variable declarations
than in Rust – for instance, you can’t trivially compile <code class="language-plaintext highlighter-rouge">&amp;[0u32; 1]</code> without
naming the array.</li>
  <li>The fact that the evaluation order is so loosely defined in C means that
intermediary computations need to be stored in intermediary variables to
enforce the evaluation order.</li>
  <li>Rust relies on whole-program monomorphization; this means that the C code is
inevitably going to contains multiple copies of the same types and functions,
but for different choices of type and const generic argumnets. This is
currently done with a builtin phase in Eurydice (for historical reasons), but
in the long run, we want to rely on Charon’s support for monomorphization.</li>
  <li>There are plenty of peephole optimizations that are required for good code
quality, such as recognizing <code class="language-plaintext highlighter-rouge">array::from_fn</code> and generating sensible code
that initializes the array in-place (instead of relying on the fully-general
compilation scheme for closures), or recognizing instances of the <code class="language-plaintext highlighter-rouge">Eq</code>
trait that deserve dedicated treatment (such as using <code class="language-plaintext highlighter-rouge">memcmp</code> for arrays and
slices of flat data).</li>
</ul>

<p>A final design choice is that for now, Eurydice may define more behaviors than
Rust – for instance, Rust panics on integer overflow, but Eurydice-compiled
code does not. This is because we assume the input code is verified, and
therefore has been shown to be free of panics. This design choice can be easily
changed, though.</p>

<p>In practice, as soon as you use traits, the C code becomes more voluminous than
the Rust code. We rely on a configuration file mechanism to control the
placement of monomorphized instances of a given function, rather than put
everything in one big C file. This currently requires a lot of manual
intervention to give good results on large projects.</p>

<h1 id="implementing-of-eurydice">Implementing of Eurydice</h1>

<p>Eurydice starts by compiling the MIR AST obtained out of Charon into
<a href="https://github.com/FStarLang/karamel/">KaRaMeL</a>’s internal AST. This is ~3000
lines of OCaml code, so that’s already pretty involved. A lot of the work
revolves around trait methods and their monomorphization, given Rust’s
expressive trait system.</p>

<p>Then, about 30 nanopasses simplify the KaRaMeL AST until it becomes eligible for
compilation to C. Of those, a handful were originally written for KaRaMeL and
were somewhat reusable; this includes compilation of data types, as well as
monomorphization. The rest was written from scratch for Eurydice, and totals
about ~5000 lines of OCaml code.</p>

<p>A particularly gnarly phase was eliminating MIR’s variable assignments as much
as possible: in MIR, every variable starts out uninitialized at the beginning of
the function; then, <em>in lieu</em> of the variable declaration, we have an assignment
with the initial value. Naturally, having a variable declaration in the right
spot is better for code quality, so an initial phase tries to reconstruct these
assignments. That’s a drawback of using MIR, but we still firmly believe that
sticking to something that has clear semantics is ultimately better.</p>

<p>Fun fact: because there are so many peephole optimizations, I got tired of
maintaining <a href="https://github.com/AeneasVerif/eurydice/blob/29a05cc79df4d63d6a0a3816f1617a3bba4814e2/lib/Cleanup2.ml#L616-L655">enormous
pattern-matches</a>
that would try to catch every flavor of
Rust iterator that can be compiled to a C for-loop. Instead, a <a href="https://github.com/AeneasVerif/eurydice/blob/94cdf3b2ea4541b658dff74e4307ce01041fcc22/cremepat">custom OCaml syntax
extension</a> allows writing <a href="https://github.com/AeneasVerif/eurydice/blob/94cdf3b2ea4541b658dff74e4307ce01041fcc22/lib/Cleanup2.ml#L609-L626">concrete
syntax</a>
for the internal KaRaMeL language in OCaml patterns. Those magic patterns then get
compiled at compile-time to OCaml AST nodes for an actual OCaml pattern that
matches the (deeply-embedded) syntax of KaRaMeL’s AST. This relies on a <code class="language-plaintext highlighter-rouge">ppx</code>
that lexes, parses and compiles the concrete syntax.</p>

<h1 id="deploying-eurydice-generated-code">Deploying Eurydice-generated code</h1>

<p>Eurydice-generated code expects some hand-written glue that contains macros and
<code class="language-plaintext highlighter-rouge">static inline</code> functions; sometimes, it’s simply more convenient to write a
single macro that uses a type, rather than have Eurydice generate N copies of a
polymorphic function that gets specialized each time. A typical example is
compiling the Eq trait for arrays: it’s nicer to emit <code class="language-plaintext highlighter-rouge">Eurydice_array_eq(a1, a2,
len, t)</code>, which macro-expands to <code class="language-plaintext highlighter-rouge">!(memcmp(a1, a2, len*sizeof(t)))</code>, rather than
have N such functions, each containing a for-loop specialized for different
values of <code class="language-plaintext highlighter-rouge">t</code>.</p>

<p>Eurydice generates code that is either (C11 and C++20-compatible) or (C++-17
compatible, but not C-compatible). The reason for this is that Rust allows enum
values (e.g. <code class="language-plaintext highlighter-rouge">Foo { bar: baz }</code>) in any expression position. For simplicity,
Eurydice emits a compound initializer <code class="language-plaintext highlighter-rouge">(Foo) { .tag = bar, .value = { .case_Foo
= { .bar = baz }}}</code>, or a C++20 aggregate that uses designated initializers,
relying on a macro (not shown here) to hide the syntax differences between the
two. But C++17 does not have designated initializers, so there is an option for
Eurydice to emit different code that <a href="https://github.com/AeneasVerif/eurydice/blob/94cdf3b2ea4541b658dff74e4307ce01041fcc22/include/eurydice_glue.h#L37">relies on member pointers</a> to achieve
sensibly the <a href="https://github.com/AeneasVerif/eurydice/blob/94cdf3b2ea4541b658dff74e4307ce01041fcc22/out/testxx-result/result.cc#L34">same effect</a>.</p>

<h1 id="limitations-of-eurydice">Limitations of Eurydice</h1>

<p>Naturally, there are many limitations to this approach. Here are the
main ones that come to mind:</p>
<ul>
  <li>we cannot guarantee that the layout of objects will be the same in C as in
Rust; conceivably, one could parse the layout information from MIR, then emit
compiler-specific alignment directives to keep the two identical, but this is
not done currently;</li>
  <li>the generated code <a href="https://github.com/AeneasVerif/eurydice/blob/main/out/test-dst/dst.c#L15">violates strict
aliasing</a>,
because creating a user-defined DST involves casting one pointer type (a
struct containing an array) to another (a struct with a flexible array
member instead); I’m not sure what the best fix is, so for now, please compile your
code with <code class="language-plaintext highlighter-rouge">-fno-strict-aliasing</code>;</li>
  <li>the code that Eurydice sees is MIR <em>after</em> applying <code class="language-plaintext highlighter-rouge">cfg</code> tweaks; this means
that for code that is intended to be multi-platform, <a href="https://github.com/AeneasVerif/eurydice/pull/260">some
tricks</a> need to be applied,
otherwise, Eurydice will only “see” one version of the code (AVX2, or ARM64,
or something else)</li>
  <li>because monorphization is so pervasive, the configuration language needs to
express things such as “types that reference <code class="language-plaintext highlighter-rouge">__m256i</code>, an AVX2-only type,
need to go into a separate file to be compiled with <code class="language-plaintext highlighter-rouge">-mavx2</code>”; this can get
tedious <a href="https://github.com/AeneasVerif/eurydice/blob/94cdf3b2ea4541b658dff74e4307ce01041fcc22/test/libcrux/c.yaml">real
fast</a>
but I’m not sure I know how to do better.</li>
</ul>

<h1 id="whats-next">What’s next?</h1>

<p>There is ongoing work to integrate Eurydice-generated code for both
<a href="https://www.microsoft.com/en-us/research/blog/rewriting-symcrypt-in-rust-to-modernize-microsofts-cryptographic-library/">Microsoft</a>
and
<a href="https://boringssl-review.googlesource.com/c/boringssl/+/77027?tab=comments">Google</a>’s
respective crypto libraries.</p>

<p>The community grew recently, with wonderful contributions by GitHub users
@ssyram and @lin23299. There are more in the pipeline, and I look forward to
seeing the supported subset of Rust grow even more. Next on the horizon is
support for <code class="language-plaintext highlighter-rouge">dyn</code> traits via vtables, and relying on Charon’s monomorphization
to get MIR exactly as the Rust compiler would monomorphize it, intead of relying
on a custom procedure in Eurydice.</p>

<p>An ambitious goal is for the whole standard library of Rust to be extractable
via Eurydice in 2026. This is non-trivial, but I believe this achievement is
within reach. Stay tuned.</p>

<h1 id="ps-why-the-name">PS: Why the name?</h1>

<p>People keep asking about the name; because the project shares a large amount of
infrastructure with <a href="https://github.com/AeneasVerif/aeneas">Aeneas</a> and
<a href="https://github.com/AeneasVerif/charon">Charon</a>, I had to follow the Greek
mythology theme. Specifically, the myth of
<a href="https://en.wikipedia.org/wiki/Eurydice">Eurydice</a> resonated with me: I thought
I was saved from the hell of <a href="/2019/01/04/behind-the-scenes.html">generating C code</a>, and was going to go back to the world of the
living, but alas, no.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Perhaps the greatest surprise of the last two years was, for me, the realization that people not only care about compiling C to Rust (for obvious reasons, such as, ahem, memory safety) – they also care about compiling Rust to C! Wait, what?]]></summary></entry><entry><title type="html">15,000 lines of verified cryptography now in Python</title><link href="http://jonathan.protzenko.fr/2025/04/18/python.html" rel="alternate" type="text/html" title="15,000 lines of verified cryptography now in Python" /><published>2025-04-18T08:00:00-07:00</published><updated>2025-04-18T08:00:00-07:00</updated><id>http://jonathan.protzenko.fr/2025/04/18/python</id><content type="html" xml:base="http://jonathan.protzenko.fr/2025/04/18/python.html"><![CDATA[<p>In November 2022, I opened <a href="https://github.com/python/cpython/issues/99108">issue 99108</a> on
Python’s GitHub repository, arguing that after a recent CVE in its implementation of
SHA3, Python should embrace verified code for all of its hash-related infrastructure.</p>

<p>As of last week, this issue is now closed, and every single hash and HMAC algorithm exposed by
default in Python is now provided by <a href="https://github.com/project-everest/hacl-star/">HACL*</a>, the
verified cryptographic library. There was no loss of functionality, and the transition was entirely
transparent for Python users. Python now vendors (includes in its repository) 15,000 lines of
verified C code from HACL*. Pulling newer versions from the upstream HACL* repository
is entirely automated and is done by invoking a script. HACL* was able to successfully implement
new features to meet all of the requirements of Python, such as: additional modes for the Blake2
family of algorithms, a new API for SHA3 that covers all Keccak variants, strict abstraction
patterns to deal with build system difficulties, proper error management (notably, allocation
failures), and instantiating HACL’s generic “streaming” functionality with the HMAC algorithm,
including an optimization that requires keeping two hash states at once.</p>

<p>This is the culmination of 2.5 years of work, and could not have happened without the invaluable
help of <a href="https://aymericfromherz.github.io/">Aymeric Fromherz</a>, who shouldered a lot of the
implementation burden. <a href="https://www.sonho.fr/">Son Ho</a> had a key contribution early on, generalizing
HACL’s “streaming” functionality to be able to handle block algorithms that require a “pre-input”.
This slightly obscure generalization was actually essential to implement a suitable, optimized HMAC
that keeps two hash states under the hood. On the Python side, Gregory P. Smith, Bénédikt Tran, and
later Chris Eibl were big champions and provided a lot of help. Finally, the HACS series of
workshops created connections (hello, Paul Kehrer!) and provided sufficient momentum to make this
happen. A warm thank you to both the Python and verified cryptographic communities!</p>

<p>As I oftentimes like to do, I’ll provide a little bit of a look behind the scenes, and comment on
some of the more interesting technical aspects that are too low-level for a research paper.</p>

<h2 id="a-primer-on-streaming-apis">A Primer on Streaming APIs</h2>

<p>Many cryptographic algorithms are <em>block</em> algorithms, meaning that they assume their input is
provided block by block, with special treatment for the first and last blocks. Well-known block
algorithms include hash algorithms, MAC algorithms (e.g. Poly1305, HMAC), and more. In practice, the
block API is not user-friendly: rarely does one have the data already chunked in blocks;
furthermore, computing a result (e.g., a hash) invalidates the state of the block algorithm, which
makes e.g. computing the intermediary hashes of the TLS transcript difficult.</p>

<p>For these reasons, cryptographic libraries typically expose <em>streaming</em> APIs, meaning clients can
provide inputs of any length; the library then takes care of buffering the input, flushing the
buffer as soon as a full block is obtained. A streaming API typically also allows extracting
intermediary hashes without invalidating the state.</p>

<p>Streaming APIs are hard, because they manipulate long-lived state with complex invariants: the
(unverified) reference implementation of SHA3 was hit with a <a href="https://nvd.nist.gov/vuln/detail/cve-2022-37454">bad
CVE</a> in 2022, which Python “inherited”, because it
vendored that very same SHA3 implementation.</p>

<p>Streaming APIs are hard also because the
underlying block algorithms differ in a myriad of different ways: all hash algorithms accept an
empty final block, <em>except</em> for Blake2; some need to retain the key at run-time (Poly1305), some can discard it
after initialization (HMAC, optimized); some need some initial input before processing the key
(Blake2), some don’t; and so on.</p>

<p>Given this inherent complexity, streaming algorithms are a good candidate for verification: we wrote
a <a href="https://arxiv.org/abs/2102.01644">research paper</a> in 2021 about this very problem.</p>

<h2 id="fully-generic-verification">Fully generic verification</h2>

<p>The main idea from the paper is that one can capture, using dependent types, what a block algorithm
<em>is</em>. Once that’s done, it suffices to author and verify a generic streaming construction once and
for all over an abstract definition of block algorithms. Then, just like you instantiate a template
in C++, you apply the generic streaming construction to a concrete block API and voilà – a
streaming API for that one block algorithm, “for free”.</p>

<p>The fist hitch is that there’s quite a big delta between what we presented in the paper (Listing
12), and <a href="https://github.com/hacl-star/hacl-star/blob/897e23d315c08f7a375408d60d1a4918477fcaa0/code/streaming/Hacl.Streaming.Interface.fsti#L254">what actually lives in the
repository</a>.
Specifically, capturing <em>any</em> block algorithm requires a lot of genericity.</p>
<ul>
  <li>The user may or may not specify the length of the final digest – for instance,
each SHA3 hash has a fixed output length, but the Shake variants produce a result whose length is
user-provided.</li>
  <li>The block algorithm may expect to receive a pre-input before the blocks of data. For e.g. SHA2,
the pre-input is empty, but for Blake2 in keyed hash mode, the pre-input is the key block.</li>
  <li>Blocks cannot be processed eagerly, because some algorithms (Blake2) do not allow for the final
block to be empty – this vastly complicates buffer management, and interacts with the pre-input.</li>
  <li>Some algorithms need to retain extra pieces of state: for instance, the key length for Blake2 can
be tweaked at initialization-time, but needs to be retained in the long-lived state.</li>
  <li>To avoid invalidating the block state upon intermediary digest extraction, our streaming API
copies the state under the hood – in some cases, it’s easier to stack-allocate this copy, but in
other cases, a heap-allocated copy makes more sense.</li>
  <li>In some cases, we wanted one API per algorithm (we have four APIs for
SHA2-{224,256,384,512}); in other cases, we wanted one API per algorithm family (we have one API
that covers all 6 Keccak algorithms: 4 variants of SHA3, and 2 variants of Shake).</li>
</ul>

<p>Getting to that level of genericity took multiple rounds, driven by the successive requirements of
Python. Ultimately, for HMAC, which was the final algorithm we landed in Python, we realized that
our proofs and definitions were generic enough that we did not need any further tweaks to
“instantiate” our generic streaming API with HMAC.</p>

<h2 id="a-bulletproof-build">A bulletproof build</h2>

<p>One highlight of submitting a PR to Python is that their infrastructure has more CI coverage than we
could possibly dream of: a complete build of Python runs over 50+ toolchains and architectures. The
flipside? We discovered some pretty annoying corner cases.</p>

<p>One particularly tricky build issue surfaced when dealing with HMAC. As a reminder,
<a href="https://en.wikipedia.org/wiki/HMAC">HMAC</a> is a generic construction that, given a hash algorithm,
provides a keyed message authentication code – in short, there is a high-level HMAC piece of code
that defers most of the work to individual hash algorithms. Each hash algorithm may itself come in a
variety of <em>implementations</em>: for instance, HMAC-Blake2b is implemented both by HMAC-Blake2b-32
(regular implementation) and HMAC-Blake2b-256 (AVX2 wide-vector implementation).</p>

<p>This already causes problems: <code class="language-plaintext highlighter-rouge">HMAC.c</code> may call functions from <code class="language-plaintext highlighter-rouge">Blake2b_256.c</code>, if Python is running
on a machine with AVX2. However, only <code class="language-plaintext highlighter-rouge">Blake2b_256.c</code> may be compiled with <code class="language-plaintext highlighter-rouge">-mavx2</code>: code from
<code class="language-plaintext highlighter-rouge">HMAC.c</code> will execute on all machines, even those without AVX2, meaning it must <em>not</em> be compiled
with <code class="language-plaintext highlighter-rouge">-mavx2</code>. So far, so good, and this is something we had done before.</p>

<p>The problem came with <code class="language-plaintext highlighter-rouge">HMAC.c</code> creating the initial state for <code class="language-plaintext highlighter-rouge">Blake2b_256.c</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;immintrin.h&gt;</span><span class="cp">
</span>
<span class="c1">// ...</span>
  <span class="n">__m256i</span> <span class="o">*</span><span class="n">blake2b_256_state</span> <span class="o">=</span> <span class="n">aligned_malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">__m256i</span><span class="p">)</span><span class="o">*</span><span class="mi">4</span><span class="p">);</span>
<span class="c1">// ...</span>

</code></pre></div></div>

<p>Most toolchains were happy with this code – <code class="language-plaintext highlighter-rouge">immintrin.h</code> defines the type <code class="language-plaintext highlighter-rouge">__m256i</code>, and even
though <code class="language-plaintext highlighter-rouge">HMAC.c</code> cannot assume AVX2 instructions are available, it’s not to hard for a
compiler to zero-initialize <code class="language-plaintext highlighter-rouge">blake2b_256_state</code> without resorting to AVX2 instructions… except,
some older compilers refused to process the <code class="language-plaintext highlighter-rouge">immintrin.h</code> header unless <code class="language-plaintext highlighter-rouge">-mavx2</code> was used, which
defeated the whole purpose.</p>

<p>This required a considerable amount of refactoring to use the well-known “C abstract struct”
pattern, which essentially defines an abstract type in C.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Blake2b_256.h</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">blake2b_256_st_s</span> <span class="n">blake2b_256_st</span><span class="p">;</span>
<span class="n">blake2b_256_st</span> <span class="o">*</span><span class="nf">blake2b_256_malloc</span><span class="p">();</span>

<span class="c1">// Blake2b_256.c</span>
<span class="cp">#include</span> <span class="cpf">&lt;immintrin.h&gt;</span><span class="cp">
</span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">blake2b_256_st_s</span> <span class="p">{</span>
    <span class="n">__m256i</span> <span class="n">contents</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
<span class="p">}</span> <span class="n">blake2b_256_st</span><span class="p">;</span>

<span class="c1">// HMAC.c</span>
<span class="n">blake2b_256_st</span> <span class="o">*</span><span class="n">blake2b_256_state</span> <span class="o">=</span> <span class="n">blake2b_256_malloc</span><span class="p">();</span>
</code></pre></div></div>

<p>What made this extra difficult is that the C code is auto-generated from F*, which has a <em>very</em>
different notion of abstraction. The compiler that goes from F* to C,
<a href="https://github.com/FStarLang/karamel/">krml</a>, had to be overhauled to perform a much more
fine-grained analysis that handles various levels of visibility (public functions, library-internal
functions, translation-unit internal functions) even in the presence of such “abstract structs”.</p>

<h2 id="handling-memory-allocation-failures">Handling memory allocation failures</h2>

<p>While our original modeling of C in F* allowed reasoning about memory allocation failures, no one
had ever bothered to do so in practice. For Python, it was desirable to be able to propagate memory
allocation failures. This meant we had to refine our definition of a generic, mutable piece of state
(such as the block state); our definition of a block algorithm (such as SHA2-256); and our generic
streaming construction to all be able to propagate memory allocation failures all the way up to the
caller. Thankfully, this didn’t turn out to be a huge deal: we inserted <code class="language-plaintext highlighter-rouge">option</code> types all along the
way, and because we had one single generic streaming construction, the implementation and proofs had
to be updated only once for the 15+ concrete instances of the streaming API.</p>

<p>The presence of <code class="language-plaintext highlighter-rouge">option</code> types in the source compiles to tagged unions in the generated C; this is a
little verbose, and we may change our definition of a piece of state to feature a <code class="language-plaintext highlighter-rouge">has_failed</code>
run-time function that can assess whether a memory allocation failed, at the expense of more
complexity and verification effort.</p>

<h2 id="propagating-changes-from-upstream-hacl-to-python">Propagating changes from upstream HACL* to Python</h2>

<p>My initial Python PR contained a shell script that would fetch the required files from the upstream
HACL* repository; ditch a bunch of superfluous definitions in headers via well-crafted sed
invocations; and tweak a few include paths in-place, also using my favorite refactoring tool (yes,
sed). The benefit was the the initial PR was lean and clean.</p>

<p>Later on, once it became clear that the upstream code was maintainable and pretty stable, that pile
of seds was eliminated, on the basis that it’s not the end of the world if a header contains a few
extra definitions, and it all makes maintenance easier.</p>

<p>Now, anyone who wishes to refresh HACL* can run the shell script in their checkout of Python, and
provided they tweak the expected hash in Python’s SBOM (software bill of materials), they are good
to go and can integrate the latest improvements.</p>

<h2 id="conclusion">Conclusion</h2>

<p>I’m delighted to see such a large-scale integration of verified cryptographic code in a flagship
project like Python. This demonstrates that verified cryptographic is not only ready from an
academic perspective, but also mature enough to be integrated in real-world software while meeting
all engineering expectations. Thanks to everyone who helped along this journey!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[In November 2022, I opened issue 99108 on Python’s GitHub repository, arguing that after a recent CVE in its implementation of SHA3, Python should embrace verified code for all of its hash-related infrastructure.]]></summary></entry><entry><title type="html">First alpha release of HACL* in Rust</title><link href="http://jonathan.protzenko.fr/2024/03/20/hacl-rs.html" rel="alternate" type="text/html" title="First alpha release of HACL* in Rust" /><published>2024-03-20T08:00:00-07:00</published><updated>2024-03-20T08:00:00-07:00</updated><id>http://jonathan.protzenko.fr/2024/03/20/hacl-rs</id><content type="html" xml:base="http://jonathan.protzenko.fr/2024/03/20/hacl-rs.html"><![CDATA[<p>I recently wrote about <a href="https://jonathan.protzenko.fr/2024/01/05/eurydice.html">ongoing
efforts</a> to retarget the
compilation of HACL* from C to Rust. Today,
<a href="https://aymericfromherz.github.io/">Aymeric</a>, myself and the entire HACL* team
are happy to announce that we have a first alpha release of HACL-rs! (Right on
time for the <a href="https://www.hacs-workshop.org/">HACS workshop</a>).</p>

<p>The goal of HACL-rs is to provide a fast, verified, <em>pure, safe Rust</em> library of
cryptographic primitives. In the long run, we simply expect HACL-rs to replace
the current HACL C code in <a href="https://github.com/cryspen/libcrux">libcrux</a>; this
will in turn remove the C FFI bindings and make it possible to use libcrux as a
pure safe Rust library.</p>

<p>We will present this work later in the year at <a href="https://sites.google.com/view/rustverify2024">Rust Verify
2024</a>, but we are making an early
announcement to gather initial feedback.</p>

<p>So far, the following algorithms are known to pass our test vectors:</p>
<ul>
  <li>hashes: sha1, sha2, sha3, blake2</li>
  <li>stream ciphers: chacha20, salsa20</li>
  <li>MACs: poly1305, hmac</li>
  <li>AEAD: chacha-poly</li>
  <li>bignums (all variants)</li>
  <li>signature: Ed25519, ECDSA-P256, RSA-PSS, FFDHE</li>
</ul>

<p>This is pretty much all of HACL, minus the multiplexing/agile EverCrypt APIs,
minus vectorized variants, and minus a few stray algorithms that we haven’t
gotten around to fixing yet (K256, HKDF).</p>

<p>The code is
<a href="https://github.com/hacl-star/hacl-star/blob/afromher_rs/dist/rs/src/">here</a>.
This is all extremely rough, and we are looking for the following kind of
feedback:</p>
<ul>
  <li>performance: notably regressions from HACL-C</li>
  <li>API feedback: we understand that none of these are Rust-native APIs, but we’d
love to know about dealbreakers (e.g., too many <code class="language-plaintext highlighter-rouge">&amp;mut</code>s) as soon as possible,
as this will also shape the final libcrux API</li>
  <li>functional bugs: there is still the possibility of runtime failures, as I was
mentioning in my previous blog post; while we have plans to fix this once and
for all before the final release, any help finding those will save us time</li>
</ul>

<p>Please file issues, send emails, or find us at HACS!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[I recently wrote about ongoing efforts to retarget the compilation of HACL* from C to Rust. Today, Aymeric, myself and the entire HACL* team are happy to announce that we have a first alpha release of HACL-rs! (Right on time for the HACS workshop).]]></summary></entry><entry><title type="html">Rust verification and backwards compatibility</title><link href="http://jonathan.protzenko.fr/2024/01/05/eurydice.html" rel="alternate" type="text/html" title="Rust verification and backwards compatibility" /><published>2024-01-05T07:00:00-08:00</published><updated>2024-01-05T07:00:00-08:00</updated><id>http://jonathan.protzenko.fr/2024/01/05/eurydice</id><content type="html" xml:base="http://jonathan.protzenko.fr/2024/01/05/eurydice.html"><![CDATA[<p>Over the past few years, a lot of eyes in the software verification community
have turned towards Rust. That’s hardly a surprise: programs written in Rust are
easier to verify, owing to the language’s strong ownership discipline; the
absence of undefined behaviors; and its strong notion of value. As a result, we
now have a plethora of tools for Rust verification: Creusot, Prusti, Verus, and
of course our very own Aeneas… not to mention tools like Kani, Gillian-Rust,
and many more. In short, 2024, I believe, shall be the year of Rust
verification! (And the year of Linux on the desktop, too, of course.)</p>

<p>This is all great and exciting, but in practice, the transition to an all-Rust
ecosystem will take time and needs to happen in a gradual fashion. Neither
practitioners nor researchers are going to switch entire systems, codebases and
toolchains to Rust overnight. Specifically, two transitions need to happen:
existing verified codebases need to be ported to Rust, and new verified Rust
code needs a backwards-compatibility story for those users who can’t
unconditionally adopt Rust just yet.</p>

<p>This blog post introduces two new projects that address the transition
challenges above. The first project allows compiling the HACL* verified crypto
library to <em>safe</em> Rust (instead of C), thus bringing <em>old</em> code into the <em>new</em>
ecosystem. The second project, named <a href="https://github.com/AeneasVerif/eurydice">Eurydice</a>, compiles Rust to C (yes, you read
that right), in order to bring <em>new</em> code into <em>old</em> ecosystems. The two
projects are complementary, and connect in both directions our <em>old</em>, legacy
toolchains with the <em>new</em> world in which code is authored in Rust directly.</p>

<h1 id="modernizing-old-codebases">Modernizing old codebases</h1>

<p>HACL*, everyone’s favorite crypto library (or so I’m told), currently amounts
to 160k lines of verified code. As Churchill famously said, that’s a lot of
code, toil, tears and sweat. Simplifying a bit, HACL* is written in a subset of
F* called Low*, for low-level F*. Low* models C memory, C concepts (machine
integers, loops), and programs written in the Low* subset can be compiled to C
via a dedicated compiler called KaRaMeL (“K&amp;R meets ML”).</p>

<p>In spite of all the success of HACL* (parts of which have been integrated into
Linux, Python, Firefox, and many more), there are two fundamental limitations to
the way HACL* is currently authored.</p>
<ul>
  <li>First, reasoning about C memory is hard, and a lot of time is wasted on
mundane, boring memory reasoning, such as “these pointers are not aliased”, or
“the footprint of this data structure does not overlap with that other piece
of state on the heap”.</li>
  <li>Second, generated code remains, ultimately, unpalatable to downstream
consumers, no matter <a href="https://jonathan.protzenko.fr/2019/01/04/behind-the-scenes.html">how much effort you put in the quality of your
parentheses</a>.</li>
</ul>

<p>Perhaps you, the ever optimistic reader, are thinking that it’s fine, and that
these issues will be addressed for the next big project, such as verified MLS. I
wish I could share your enthusiasm: if there’s one message that emerged loud and
clear of the <a href="https://www.hacs-workshop.org/">HACS</a> workshop series, it’s that
users want a pure Rust crypto library – no bindings, just <em>safe</em> Rust!</p>

<p>Perhaps you, still the optimistic reader, might think that we just designed a
new toolchain, <a href="https://github.com/AeneasVerif/">Aeneas</a>, that aims to
address these issues once and for all, and not just for crypto. Indeed, by
taking Rust code written by programmers, Aeneas avoids the unsavory
code-generation step; by doing a <a href="https://arxiv.org/abs/2206.07185">pure
translation</a>, Aeneas relieves the programmer of
un-necessary memory reasoning.</p>

<p>But sadly, there is no world in which we have the resources, manpower and
justification to rewrite all of HACL* in Rust/Aeneas. This leaves us with one
option: tweak the code-generation to produce Rust code instead of C. This is
exactly what <a href="https://aymericfromherz.github.io/">Aymeric</a> and I have been up
to. The new project is called hacl-rs, and encompasses changes to both HACL*
and KaRaMeL.</p>

<p>The translation, for the time being, can be described as follows:</p>
<ul>
  <li>it happens <em>after</em> all of the compilation passes of KaRaMeL, such as
monomorphization, data type elimination, and so on – eventually, we want to
leverage Rust’s support for these features, but for now, the priority is to
get working code</li>
  <li>Low*’s arrays (a.k.a. the <code class="language-plaintext highlighter-rouge">buffer</code> type) compile to mutable borrowed slices;
Low*’s const arrays (a.k.a. the <code class="language-plaintext highlighter-rouge">const_buffer</code> type) compile to shared
borrowed slices</li>
  <li>as an optimization, stack allocations are initially given a Rust array type,
and a type-directed translation inserts a borrow to turn it into a slice when
the context expects it</li>
  <li>the Rust backend of KaRaMeL does a better job at choosing names and laying out
files in a way that is amenable to easy cargo compilation.</li>
</ul>

<p>Naturally, generating Rust out of Low* is not as trivial as adding a new
backend to the existing KaRaMeL compiler.</p>

<p>The first series of problems arises on the HACL* side, which has a slightly
irritating (but totally justified) pattern of “maybe in place” functions. Those
functions, once compiled to Rust, are certain to trip the borrow-checker. To
make things concrete, consider a simplified, albeit representative example: the
addition function for fixed-size, 256-bit bignums, as expressed in Low*.</p>

<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">(* 256-bit bignums are represented as an array of 8 32-bit unsigned integers *)</span>
<span class="k">let</span> <span class="n">bignum256_t</span> <span class="o">=</span> <span class="n">buffer</span> <span class="n">uint32_t</span> <span class="mi">8</span>

<span class="c">(* addition takes a pre-allocated destination bignum, and returns the carry as a
 * uint32_t; the Stack annotation means that the function does not
 * heap-allocate *)</span>
<span class="k">val</span> <span class="n">bn_add</span><span class="o">:</span> <span class="n">dst</span><span class="o">:</span><span class="n">bignum256_t</span> <span class="o">-&gt;</span> <span class="n">x</span><span class="o">:</span><span class="n">bignum256_t</span> <span class="o">-&gt;</span> <span class="n">y</span><span class="o">:</span><span class="n">bignum256_t</span> <span class="o">-&gt;</span> <span class="nc">Stack</span> <span class="n">uint32_t</span>
  <span class="p">(</span><span class="k">fun</span> <span class="n">h0</span> <span class="o">-&gt;</span>
    <span class="n">disjoint_or_equal</span> <span class="n">dst</span> <span class="n">x</span> <span class="o">/</span><span class="err">\</span> <span class="c">(* &lt;-- IMPORTANT BIT *)</span>
    <span class="n">disjoint</span> <span class="n">x</span> <span class="n">y</span> <span class="o">/</span><span class="err">\</span>
    <span class="n">disjoint</span> <span class="n">dst</span> <span class="n">y</span><span class="p">)</span>
  <span class="p">(</span><span class="k">fun</span> <span class="n">h0</span> <span class="n">r</span> <span class="n">h1</span> <span class="o">-&gt;</span>
    <span class="n">modifies</span> <span class="n">dst</span> <span class="n">h0</span> <span class="n">h1</span> <span class="o">/</span><span class="err">\</span>
    <span class="k">let</span> <span class="n">dst_nat</span> <span class="o">=</span> <span class="n">as_nat</span> <span class="n">h1</span> <span class="n">dst</span> <span class="k">in</span>
    <span class="k">let</span> <span class="n">x_nat</span> <span class="o">=</span> <span class="n">as_nat</span> <span class="n">h0</span> <span class="n">x</span> <span class="k">in</span>
    <span class="k">let</span> <span class="n">y_nat</span> <span class="o">=</span> <span class="n">as_nat</span> <span class="n">h0</span> <span class="n">y</span> <span class="k">in</span>
    <span class="n">dst_nat</span> <span class="o">==</span> <span class="p">(</span><span class="n">x_nat</span> <span class="o">+</span> <span class="n">y_nat</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span><span class="o">^</span><span class="mi">256</span> <span class="o">/</span><span class="err">\</span>
    <span class="n">r</span> <span class="o">==</span> <span class="p">(</span><span class="n">x_nat</span> <span class="o">+</span> <span class="n">y_nat</span><span class="p">)</span> <span class="o">&gt;&gt;</span> <span class="mi">256</span><span class="p">)</span>

<span class="k">let</span> <span class="n">ladder</span> <span class="o">...</span> <span class="o">=</span>
  <span class="o">...</span>
  <span class="n">bn_add</span> <span class="n">dst</span> <span class="n">foo</span> <span class="n">bar</span><span class="p">;</span>
  <span class="n">bn_add</span> <span class="n">dst</span> <span class="n">dst</span> <span class="n">baz</span><span class="p">;</span>
  <span class="o">...</span>
</code></pre></div></div>

<p>This signature allows callers to potentially pass the <em>same</em> argument for both
<code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">dst</code>. The reasoning behind this signature is that it allows for an
efficient, in-place series of operations where the same bignum is modified
through a sequence of operations. Alas, this is exactly the sort of pattern that
Rust disallows! The function would compile and type-check per our translation
scheme, but would implicity require its arguments to be disjoint. This means
that any call-site that passes aliased arguments will generate a borrow error.
There are ways to work around this, such as using interior mutability, but this
would be very inefficient.</p>

<p>Instead, there is a systematic rewriting pattern that works quite well,
leveraging F*’s compile-time reduction facilities:</p>

<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">(* New wrapper around bn_add, for when we know dst and x are aliased *)</span>
<span class="n">inline_for_extraction</span> <span class="c">(* &lt;-- IMPORTANT BIT *)</span>
<span class="k">let</span> <span class="n">bn_add_aliased</span> <span class="p">(</span><span class="n">dst</span><span class="o">:</span> <span class="o">...</span><span class="p">)</span> <span class="n">x</span> <span class="n">y</span><span class="o">:</span> <span class="o">...</span> <span class="o">=</span> <span class="c">(* same signature as before *)</span>
  <span class="k">let</span> <span class="n">x_copy</span> <span class="o">=</span> <span class="n">copy</span> <span class="n">x</span> <span class="k">in</span>
  <span class="n">bn_add</span> <span class="n">dst</span> <span class="n">x_copy</span> <span class="n">y</span>

<span class="k">let</span> <span class="n">ladder</span> <span class="o">...</span> <span class="o">=</span>
  <span class="o">...</span>
  <span class="n">bn_add</span> <span class="n">dst</span> <span class="n">foo</span> <span class="n">bar</span><span class="p">;</span>
  <span class="n">bn_add_aliased</span> <span class="n">dst</span> <span class="n">dst</span> <span class="n">baz</span><span class="p">;</span>
  <span class="o">...</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">bn_add_aliased</code> function sports the exact same signature as its non-aliased
counterpart, meaning that call-sites remain unchanged, something we are adamant
about, since proofs at those call-sites might be fragile and not easily fixable.
Superficially, the <code class="language-plaintext highlighter-rouge">bn_add_aliased</code> function still violates, once translated,
the laws of Rust’s borrow-checker. Fortunately, the <code class="language-plaintext highlighter-rouge">inline_for_extraction</code>
keyword means that F* reduces this definition, leaving at call-site:</p>

<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">ladder</span> <span class="o">...</span> <span class="o">=</span>
  <span class="o">...</span>
  <span class="n">bn_add</span> <span class="n">dst</span> <span class="n">foo</span> <span class="n">bar</span><span class="p">;</span>
  <span class="k">let</span> <span class="n">dst_copy</span> <span class="o">=</span> <span class="n">copy</span> <span class="n">dst</span> <span class="k">in</span>
  <span class="n">bn_add</span> <span class="n">dst_copy</span> <span class="n">dst</span> <span class="n">baz</span>
  <span class="o">...</span>
</code></pre></div></div>

<p>As far as the Rust borrow-checker is concerned, this is perfectly fine.</p>

<p>The second series of problems concerns code-generation, that is, KaRaMeL’s
compilation scheme. Consider the following program, which is fine in Low* (<code class="language-plaintext highlighter-rouge">c</code>
comes before <code class="language-plaintext highlighter-rouge">b</code>, intentionally):</p>

<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">ladder</span> <span class="o">...</span> <span class="p">(</span><span class="n">abcd</span><span class="o">:</span> <span class="kt">array</span> <span class="n">uint32</span> <span class="mi">32</span><span class="p">)</span> <span class="o">=</span> <span class="c">(* four bignums side-by-side *)</span>
  <span class="k">let</span> <span class="n">a</span> <span class="o">=</span> <span class="n">abcd</span> <span class="o">+</span> <span class="mi">0</span> <span class="k">in</span>
  <span class="k">let</span> <span class="n">c</span> <span class="o">=</span> <span class="n">abcd</span> <span class="o">+</span> <span class="mi">16</span> <span class="k">in</span>
  <span class="k">let</span> <span class="n">b</span> <span class="o">=</span> <span class="n">abcd</span> <span class="o">+</span> <span class="mi">8</span> <span class="k">in</span>
  <span class="k">let</span> <span class="n">d</span> <span class="o">=</span> <span class="n">abcd</span> <span class="o">+</span> <span class="mi">24</span> <span class="k">in</span>
  <span class="o">...</span>
</code></pre></div></div>

<p>One can’t perform arbitrary pointer arithmetic in Rust! The only primitive
available is <code class="language-plaintext highlighter-rouge">split_at</code> (or <code class="language-plaintext highlighter-rouge">split_at_mut</code>, depending). For these cases, we
perform a little bit of static analysis on the fly and record the history of
pointer arithmetic operations with base <code class="language-plaintext highlighter-rouge">abcd</code> in a tree.
After the first operation, we record that <code class="language-plaintext highlighter-rouge">abcd</code> was split at index 0.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LOW* (SOURCE)     RUST (DESTINATION)                    TREE

let a = abcd+0    let r_a = abcd.split_at_mut(0)            a @ 0
</code></pre></div></div>

<p>At this stage, Rust’s <code class="language-plaintext highlighter-rouge">r_a</code> is a pair of slices, meaning a reference to <code class="language-plaintext highlighter-rouge">a</code> in
the source code compiles to <code class="language-plaintext highlighter-rouge">r_a.1</code> in Rust.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LOW* (SOURCE)     RUST (DESTINATION)                    TREE

let a = abcd+0    let r_a = abcd.split_at_mut(0)            a @ 0
let c = abcd+16   let r_c = r_a.1.split_at_mut(16)             \
                                                              c @ 16
</code></pre></div></div>

<p>We keep going, extending the tree to keep track of the relationships between
variables. Above, in the generated Rust, <code class="language-plaintext highlighter-rouge">r_c</code> is found in the right component
of the pair <code class="language-plaintext highlighter-rouge">r_a</code>, which it splits at index <code class="language-plaintext highlighter-rouge">16</code>. At that program point, a
reference to <code class="language-plaintext highlighter-rouge">a</code> becomes <code class="language-plaintext highlighter-rouge">r_c.0</code>, while a reference to <code class="language-plaintext highlighter-rouge">c</code> becomes <code class="language-plaintext highlighter-rouge">r_c.1</code>. In
other words, we assume that the <em>intent</em> of the programmer is that the slices be
non-overlapping. This static analysis continues, and eventually yields:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LOW* (SOURCE)     RUST (DESTINATION)                    TREE

let a = abcd+0    let r_a = abcd.split_at_mut(0)            a @ 0
let c = abcd+16   let r_c = r_a.1.split_at_mut(16)             \
let b = abcd+8    let r_b = r_c.0.split_at_mut(8)             c @ 16
let d = abcd+24   let r_d = r_c.1.split_at_mut(8)              /   \
                                                             b @ 8  d @ 8
</code></pre></div></div>

<p>Eventually, <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">c</code> and <code class="language-plaintext highlighter-rouge">d</code> compile to <code class="language-plaintext highlighter-rouge">r_b.0</code>, <code class="language-plaintext highlighter-rouge">r_b.1</code>, <code class="language-plaintext highlighter-rouge">r_d.0</code> and
<code class="language-plaintext highlighter-rouge">r_d.1</code> respectively.</p>

<p>This optimistic compilation scheme, as it turns out, can handle a very large
amount of cases in HACL* without requiring any modifications to the source,
which is a huge win for us! Of course, this only works because the crypto code
that lives in HACL* has a very specific shape and doesn’t perform
general-purpose pointer arithmetic.</p>

<p>This scheme has several drawbacks.</p>
<ul>
  <li>It cannot detect the case of actually overlapping slices, because pointer
arithmetic operations do not come with the length of the sub-array
(technically it’s erased by the time it reaches KaRaMeL).
This means that there might be out-of-bounds runtime errors in the generated
Rust. This, naturally, is an unpleasant property, and we plan to address it by
changing the Low* extraction scheme to retain the length of subarrays. In the
meanwhile, we seems to be lucky, as no such cases appear to happen in the
source HACL*.</li>
  <li>The scheme needs to be extended in case indices are not statically-known
integers; in practice, our code can account for symbolic terms, which may be
compared (e.g. <code class="language-plaintext highlighter-rouge">e &lt; e + 8</code>) and thus suitably related to one another in the
tree. If two indices cannot be compared, the code makes the assumption that
the user is splitting the array into chunks going left to right. Again, this
is an unpleasant source of imprecision and we would like to rewrite the source
code to get rid of all these cases.</li>
  <li>In addition, our scheme needs to deal with the (frequent) case where the code
restarts pointer arithmetic off of the base, with difference indices. For
instance, our earlier example might be followed by <code class="language-plaintext highlighter-rouge">let ab = abcd + 0; let cd
= abcd + 16</code>, meaning we need to discard the previous tree and restart the
static analysis. Support for this has been recently implemented.</li>
  <li>Finally, we need to detect the intractable case where two different uses are
interleaved (e.g. use <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code>, use <code class="language-plaintext highlighter-rouge">ab</code>, then <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> again). Sadly,
this happens in HACL*, but should be easy to flag as an error in the code-gen
before we produce Rust code.</li>
</ul>

<h2 id="what-about-vale">What about Vale?</h2>

<p>Some readers might remember that the HACL* repository also hosts the Vale
algorithms, written in a deeply-embedded Intel x64 assembly DSL. Those have
their own compiler (more of a printer, really), which generates either assembly
files (<code class="language-plaintext highlighter-rouge">.S</code> or <code class="language-plaintext highlighter-rouge">.asm</code>), or inline assembly headers, i.e., C headers with <code class="language-plaintext highlighter-rouge">static
inline</code> functions containing <code class="language-plaintext highlighter-rouge">__asm__</code> blocks, a GCC-ism that allows <a href="https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html">writing
assembly directly within C
code</a>.</p>

<p>Our plan for those, currently implemented by Aymeric, is to retarget the printer
to generate Rust inline assembly syntax. This means that algorithms like
Curve25519’s 64-bit Intel ADX version, instead of generating a mixture of C and
inline ASM (as is currently done in HACL*), will generate a mixture of Rust and
Rust inline ASM. Rust and C inline assembly share many similarities; however,
Rust imposes a few additional restrictions, such as forbidding the use of the
<code class="language-plaintext highlighter-rouge">rbx</code> register for input and output operands which will require small tweaks to
the verified Vale assembly.</p>

<h1 id="catering-to-legacy-environments">Catering to legacy environments</h1>

<p>As I mentioned above, Aeneas is the future! Crypto algorithms will be written in
Rust by the programmer, and verification elves will confirm that the code
exhibits all the required properties. But for all the excitement around Rust, a
large variety of contexts still require the use of a legacy toolchain. Your
project might be catering to vintage Unices from the 80s; or perhaps targets an
embedded environment for which only proprietary C toolchains are available; or
your management simply hasn’t reached enlightenment yet.</p>

<p>In spite of all of that, we still want to verify Rust, because it’s so much
easier to work on a functional model rather than stateful pointer-wielding code.
In order to get the best of both worlds, the Aeneas family is getting a new
Greek-named tool: <a href="https://github.com/AeneasVerif/eurydice">Eurydice</a>. Eurydice connects to Charon, the same Rust compiler
plug-in and frontend that Aeneas uses. Leveraging KaRaMeL as a library, Eurydice
compiles MIR down to C.</p>

<p>The first challenge is to compile MIR into KaRaMeL’s internal AST. MIR is very
regular, and has a clear distinction between arrays, which represent storage
space, and pointers, which represent only a single word. The C language, on the
other hand, is much murkier when it comes to semantics: there is no way to talk
about the <em>contents</em> of an array, and there are no rvalues of array types –
even though <code class="language-plaintext highlighter-rouge">x</code> may have type <code class="language-plaintext highlighter-rouge">int[4]</code>, <code class="language-plaintext highlighter-rouge">x</code> as an <code class="language-plaintext highlighter-rouge">rvalue</code> always contains an
implicit address-of operator, something which is explicit in MIR. Resolving that
discrepancy requires a type-directed translation step, which happens first in
the compilation pipeline.</p>

<p>Then, there are other semantic discrepancies – as I mentioned above, C arrays
always decay, meaning they cannot be returned by value: one uses an “outparam”
instead. Conversely, in Rust, one can very naturally return an array. This means
that array-returning Rust functions need to be rewritten into outparam-taking C
functions, <em>unless</em> the array is contained within a struct, in which case C
changes its mind and lets you pass the whole thing by value.</p>

<p>If anything, Eurydice has given me much greater appreciation for Rust’s clean
semantics, and a lot more bitterness about the state of the software industry,
seeing that C has been the <em>de facto</em> solution for so long. But I digress.</p>

<p>The rest of the 30 or so nano compilation passes in Eurydice are about code
quality and compiling away Rust features (function and data type polymorphism,
assigning arrays by value) that C does not support. Of those, half were around
found in KaRaMeL (or could be reused with minimal tweaks), and half are written
specifically for Eurydice.</p>

<p>One interesting tidbit was the compilation of slices: they compile to a C struct
(defined by hand) containing base pointer (of type <code class="language-plaintext highlighter-rouge">void *</code>) and length. Every
slice-using function is naturally polymorphic in Rust; I added support in
KaRaMeL for polymorphic opaque functions, which compile in C as a function call
receiving the type argument, intended to be implemented by hand as a macro. This
means that a Rust function like <code class="language-plaintext highlighter-rouge">copy_from_slice</code> is compiled generically by
Eurydice, then implemented by hand as follows:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define core_slice___Slice_T___copy_from_slice(dst, src, t) memcpy(dst.ptr, src.ptr, dst.len * sizeof(t))
</span></code></pre></div></div>

<p>Similarly, indexing a slice with a range is compiled generically and can be
hand-implemented as a macro that receives a slice <code class="language-plaintext highlighter-rouge">s</code>, a range <code class="language-plaintext highlighter-rouge">r</code> and a type
<code class="language-plaintext highlighter-rouge">t</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define Eurydice_slice_subslice(s, r, t) \
  ((Eurydice_slice){ .ptr = (void*)((t*)s.ptr + r.start), .len = r.end - r.start)
</span></code></pre></div></div>

<p>Note that we rely on C11 struct literals, so that we can have a struct as a
value, and that the type is essential here in order to perform correct pointer
arithmetic on a well-typed pointer.</p>

<h1 id="status">Status</h1>

<p>Currently, about a dozen modules from HACL-rs compile to Rust, and some like
Curve25519 or Chacha-Poly have successfully been run and tested. This remains
preliminary work and we hope to have a non-trivial amount of algorithms running
and passing test vectors very soon.</p>

<p>For the other direction, we have successfully extracted and compiled a complete
implementation of Kyber from Rust to C, with only minimal rewrites related to
ongoing support for traits.</p>

<h1 id="performance">Performance</h1>

<p>To everyone’s great surprise, performance has not been a concern in either
direction. When compiling HACL* to Rust, the performance is within 2% of the
original C, and merging those modifications that rewrite alias-taking functions
sometimes even improves performance of the C code by a little bit! When
compiling Kyber to C via Eurydice, there is no measurable performance difference
– intuitively, it probably all looks the same to LLVM.</p>

<h1 id="wrap-up">Wrap up</h1>

<p>The future is bright for Rust verification, and it looks like we have a plan for
transitioning our ecosystems to Rust. Join the effort, and hit us up if you’d
like to help out!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Over the past few years, a lot of eyes in the software verification community have turned towards Rust. That’s hardly a surprise: programs written in Rust are easier to verify, owing to the language’s strong ownership discipline; the absence of undefined behaviors; and its strong notion of value. As a result, we now have a plethora of tools for Rust verification: Creusot, Prusti, Verus, and of course our very own Aeneas… not to mention tools like Kani, Gillian-Rust, and many more. In short, 2024, I believe, shall be the year of Rust verification! (And the year of Linux on the desktop, too, of course.)]]></summary></entry><entry><title type="html">Verified Secure Group Messaging with MLS</title><link href="http://jonathan.protzenko.fr/2023/06/09/mls.html" rel="alternate" type="text/html" title="Verified Secure Group Messaging with MLS" /><published>2023-06-09T08:00:00-07:00</published><updated>2023-06-09T08:00:00-07:00</updated><id>http://jonathan.protzenko.fr/2023/06/09/mls</id><content type="html" xml:base="http://jonathan.protzenko.fr/2023/06/09/mls.html"><![CDATA[<p>Long gone are the days of my youth where AOL Messenger and IRC ruled online
communication spaces… in this day and age, people use WhatsApp, Signal,
Facebook Messenger, or even Instagram Messages (or so I am told). This new
generation of messenger apps provides some fundamental technological
improvements, such as Unicode Emoji instead of ASCII smileys, or perhaps more
interestingly to readers of this blog, End-to-End Encryption.</p>

<p>While I could write several blog posts about the many fascinating facets of
Unicode, including segmentation, encodings, normalization, and the <a href="https://blog.emojipedia.org/ninja-cat-the-windows-only-emoji/">creative
use</a> of
zero-width joiner codepoints to create new emojis… I would like to focus today
on <a href="https://messaginglayersecurity.rocks/">MLS</a>, a new standard in the space of
secure messaging protocols. Specifically, this blog post focuses on <em>verifying</em>
MLS, in order to establish with high assurance that it does, indeed, provide
End-to-End Encryption in the context of <em>group</em> messaging. This blog post is an
informal version of <a href="https://eprint.iacr.org/2022/1732">our paper</a>, which my
student Théophile (co-advised with Karthik B at INRIA) will present at Usenix
Security this summer.</p>

<p>Our paper sheds lights on what it means to perform secure group management;
identifies the sub-component of the standard in charge of this and formalizes
it. This yields a reference implementation (for interop testing, or just for
other implementors’ inspiration), and also yields bugs or shortcomings in
the standard (which are now all fixed). Read on for more details.</p>

<h2 id="a-word-about-mls">A word about MLS</h2>

<p>Théophile has written an <a href="https://www.twal.org/blog/0001_what_is_mls/">excellent
introduction</a> to MLS. To recap
briefly: two-party end-to-end secure messaging has been widely studied, notably
through the Signal protocol, which powers many encrypted messengers (notably,
those mentioned above, but not AOL Messenger or IRC, obviously).
But oftentimes, users have multiple devices, and sharing keys between devices is
bad™️. Ergo, every conversation is actually a group conversation between multiple
devices, and for that we need something beyond Signal. Several solutions exist,
but they’re suboptimal (see Théophile’s blog post), which is why “the industry”
has been hard at work at the IETF devising a new standard called MLS that aims
to provide end-to-end encryption, for groups.</p>

<p>If that all sounds familiar, it is: other protocols like TLS have been going
through the same standardization process at the IETF. But unlike TLS, the
academic community was involved from the get-go in the design and security
analysis of MLS. Which begs the question: what properties are we trying to
establish here?</p>

<!-- This blog post is about *verification*, so let's begin with the goals of MLS, -->
<!-- then see which of those can be formally established. -->
<!-- Once again, the curious reader can header over to Théophile's -->
<!-- [blog](https://www.twal.org/blog/0001_what_is_mls/) for the full details. -->

<p>First, some basics: MLS only takes care of processing messages and maintaining
group data structures and corresponding cryptographic material. Crucially, MLS
assumes the existence of two important components, along with some behavioral
properties they must enjoy.
First, a delivery service, which is responsible for carrying
messages from one device to all the other participants in the group, retrying if
need be, handling eventual consistency, and so on. Second, an identity service
(sometimes called a directory) which associates some initial cryptographic
material to a given identity, and somehow guarantees that the directory entry
for Alice is indeed the correct one.</p>

<p>Now on to the expected properties of MLS. The functional requirements of MLS
are perhaps unsurprising: it should be able to deal with group members going
offline for periods of time then “catching up”; it should be scalable and avoid
some quadratic issues currently found in some of the existing solutions;
and it should support long-lived groups where people come and go over time.</p>

<p>The security requirements are the interesting part. We expect the usual
guarantees: authenticity (message from Alice is indeed from Alice),
confidentiality (Eve who isn’t in the group can’t decipher the messages),
forward secrecy (decrypting at some point in time doesn’t compromise
previous conversations) and post-compromise recovery (even if the key of Bob is
compromised, after a period of healing, the cryptographic material has been
refreshed and the attacker can’t eavesdrop any further).</p>

<h2 id="roster-agreement-and-its-security">Roster agreement, and its security</h2>

<p>All of these guarantees make sense as long as we can trust the membership of
the group. This is known as <em>roster agreement</em>, and states that
everyone in the group is on the same page as to <em>who exactly</em> belongs to the
group. This is a particularly tricky property to establish, and attacks <a href="https://eprint.iacr.org/2020/1327">have been
found</a> by others, on this exactly. This
brings us to <a href="https://eprint.iacr.org/2022/1732">our paper</a>. In it, we tackle
questions such as: what exactly <em>is</em> secure group management, how does one
capture those security properties, and how does one go about proving them.</p>

<p>The first contribution of the paper is to extricate and disentangle the core
group management from what is now a very large standard. We call it TreeSync,
and present it as its own, <em>generic</em> protocol that operates independently of the
others parts of MLS, which are concerned with epoch secrets and deriving message
encryption keys from those secrets. This in itself is novel: the current
standard does not identify TreeSync, and it can be difficult to understand what
is happening in what is now a very large the document. Our work makes it easier
to understand the standard from a security and functional standpoint.  Thus,
TreeSync lives on, with its mandate being to make sure that <em>everyone</em> in the
group has signed off (authenticated) the contents of the roster.</p>

<p>I called TreeSync a <em>protocol</em>: it operates on its own data structure
(unsurprisingly, a tree), and can process messages to enact addition, deletion,
or key refresh of the members currently in the group, meaning the tree evolves
over time. Identifying TreeSync as a protocol on its own allows us to i) better
understand past attacks as specifically targeting TreeSync, i.e. the group
management part of MLS, and ii) to find new attacks that were previously not
“visible” due to TreeSync being mixed up with the rest of MLS.</p>

<p>We formalize TreeSync in F*, sprinkle a generous amount of dependent types to
encode a variety of invariants, then use the existing
<a href="https://ieeexplore.ieee.org/document/9581188">DY*</a> framework to reason about
protocol security in the symbolic model. As always, using formal reasoning
forces us to think about the details: we ended up looking very closely at a core
part of TreeSync called “ParentHash”, which computes a digest
of the membership tree that TreeSync manages. Informally, the invariant of the
membership tree is that each member (leaf) signs (authenticates) the subtree
that is still in the same state as the leaf (hasn’t been updated since the leaf
was itself changed). This is a crucial piece of information, since it allows new
members to verify the signatures to ensure that membership is authenticated
(signed) by group members, not by an external attacker.</p>

<p>There were several problems, related to two core optimizations. The first
optimization is called “filtered nodes”, and essentially stems from the fact
that the membership tree is a complete binary tree, but that in general, some
of those nodes are empty. The second optimization is called “unmerged leaves”,
and allows de-coupling addition from authentication. Essentially, unmerged
leaves are waiting to be authenticated by the next refresh of secrets in the
tree (see the paper for a more precise explanation). Overall, our work not
only clarified the expected TreeSync invariant, using formal language, but also
found several situations in which the invariant could be broken. These in turn
defeated the security guarantees of MLS, and in particular could lead to a state
of confusion where not all group members agree on whether Alice is in the group
or not. All the fixes to the standard stemming from this work have been
integrated in the RFC.</p>

<p>There are plenty of other interesting details in the paper, including a
signature confusion attack, where the absence of disambiguating label could
allow a signature to be reused in another part of the protocol (this is also
very bad™️). But I’ll just conclude this brief overview with a note about
implementation.</p>

<h2 id="actually-running-the-code">Actually running the code</h2>

<p>In this line of research, and especially in the sort of
protocols-meet-the-real-world projects that Karthik and I have been pursuing
over the past few years, having <em>actual</em> code has always been a priority. This
paper makes no exception, and we took great care to ensure that our reference
implementation can actually be executed and is not just a bunch of lemmas. This
is not just for fun: without serious interop testing, you might be proving great
properties about a protocol that turns out to be MLS’, but not the actual MLS.</p>

<p>For this work, with the addition of byte-correct parsers and serializers (more
on that in a later blog post), we were able to extract our entire MLS specification. Unlike
previous work in Low* that extracts to C, this time, we extracted using F*’s
regular extraction pipeline to OCaml. (Side-note: we would like to have a verified
implementation of MLS in Rust using Aeneas, more on that also later.) As we had
seen previously in other works (Signal*, Noise*…), the performance of the
protocol is mostly dominated by its crypto primitives. What we did here is
compile MLS to JavaScript using <code class="language-plaintext highlighter-rouge">js_of_ocaml</code>, then “plug” the protocol part on
top of the efficient WebAssembly crypto primitives from our <a href="https://eprint.iacr.org/2019/542">2019
work</a> on compiling HACL* to WASM. The result
is a very decent implementation of MLS, which we actually integrated into a
prototype version of Skype. We were able to successfully converse in secure
group settings, and showed that one can be both specification-oriented <em>and</em>
have very decent performance.</p>

<p>There is much more to learn about MLS: Théophile is planning an in-depth dive
<a href="https://www.twal.org/blog/">on his blog</a>, meaning I can stop my overview here,
and redirect the curious reader to our paper, or the blog. Hit us up if you want
a copy of the implementation!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Long gone are the days of my youth where AOL Messenger and IRC ruled online communication spaces… in this day and age, people use WhatsApp, Signal, Facebook Messenger, or even Instagram Messages (or so I am told). This new generation of messenger apps provides some fundamental technological improvements, such as Unicode Emoji instead of ASCII smileys, or perhaps more interestingly to readers of this blog, End-to-End Encryption.]]></summary></entry><entry><title type="html">5 Years of Meta-Programming Cryptography</title><link href="http://jonathan.protzenko.fr/2022/05/22/meta-programming-cryptography.html" rel="alternate" type="text/html" title="5 Years of Meta-Programming Cryptography" /><published>2022-05-22T08:00:00-07:00</published><updated>2022-05-22T08:00:00-07:00</updated><id>http://jonathan.protzenko.fr/2022/05/22/meta-programming-cryptography</id><content type="html" xml:base="http://jonathan.protzenko.fr/2022/05/22/meta-programming-cryptography.html"><![CDATA[<p>For the past five years or so, I have been exploring, along with many other
collaborators, the space of verified cryptography and protocols. The premise
is simple: to deliver critical code that comes with strong properties of
correctness, safety and security. Some readers may be familiar with <a href="https://project-everest.github.io/">Project
Everest</a>: this is the umbrella under which
a large chunk of this work happened.</p>

<p>This blog post focuses on the specific topic of using meta-programming and
compile-time evaluation to scale up cryptographic verification. This theme has
been a key technical ingredient in almost every paper since 2016, with each
successive publication making a greater use of either one of these techniques.
The topic is of interest to my programming languages (PL)-oriented readers. And,
as far as I know, I haven’t really written anything that highlights this
unifying theme in the research arc of verified crypto and protocols.</p>

<p>These techniques all come together in our most recent paper on Noise*, in which
we use a Futamura projection and a combination of deep and shallow embeddings to
run a protocol compiler on the F* normalizer. The Noise* work is the
culmination of five years of working on PL techniques for cryptography, and it
will be presented this week at Oakland (S&amp;P). This is a great timing for a
retrospective!</p>

<p>A disclaimer before I go any further: this is my non-professional blog, and this
post will inevitably capture my personal views and individual recollections.</p>

<h3 id="basic-partial-evaluation-in-low">Basic Partial Evaluation in Low*</h3>

<p>In 2017, we introduced the <a href="https://arxiv.org/abs/1703.00053">Low* toolchain</a>.
The core idea is that we can model a palatable subset of C directly within F*.</p>

<pre><code class="language-fstar">// Allocate a zero-initialized array of 32 bytes on the stack.
let digest = Array.alloca 32ul 0uy in
SHA2.hash digest input input_len
</code></pre>

<p>The special <code class="language-plaintext highlighter-rouge">Array</code> module is crafted so that F* enforces a C-like discipline,
permitting reasoning about liveness, disjointness, array lengths, and so on.
(There are many more such modules in Low*.) In the example above, F* would
typically check that i) <code class="language-plaintext highlighter-rouge">digest</code> and <code class="language-plaintext highlighter-rouge">input</code> are live pointers, that ii)
<code class="language-plaintext highlighter-rouge">input_len</code> represents the length of <code class="language-plaintext highlighter-rouge">input</code>, that iii) <code class="language-plaintext highlighter-rouge">digest</code> and <code class="language-plaintext highlighter-rouge">input</code> are
disjoint, and so on. Provided your code verifies successfully and abides by
further (mostly syntactic) restrictions, then a special compiler called
<a href="https://github.com/FStarLang/kremlin/">KaRaMeL</a> (née KReMLin) will turn it into C code.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint8_t</span> <span class="n">digest</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="n">SHA2_hash</span><span class="p">(</span><span class="n">digest</span><span class="p">,</span> <span class="n">input</span><span class="p">,</span> <span class="n">input_len</span><span class="p">);</span>
</code></pre></div></div>

<p>As we were developing what would become
<a href="https://eprint.iacr.org/2017/536">HACL*</a>, an issue quickly arose. For the
purposes of verification, it is wise to
split up a function into many small helpers, each with their own pre- and
post-conditions. This makes verification more robust; code becomes more
maintainable; and small helpers promote code reuse and modularity. However, at
code-generation time, this produces a C file with a multitude of three-line
functions scattered all over the place. This is not idiomatic, makes C code
harder to follow, and generally diminishes our credibility when we want
verification-agnostic practitioners to take us seriously. I <a href="https://jonathan.protzenko.fr/2019/01/04/behind-the-scenes.html">wrote
extensively</a>
about this a few years back: even though it’s auto-generated, producing
<em>quality</em> C code matters!</p>

<p>The initial workaround (added Dec 2016) was to add an ad-hoc optimization pass
to KaRaMeL, the F*-to-C compiler, using a new F* keyword, dubbed
<code class="language-plaintext highlighter-rouge">[@substitute]</code>. See <a href="https://github.com/project-everest/hacl-star/blob/896546b1c0c63eedcdc36662cad37b15febb3b03/code/salsa-family/Hacl.Impl.Chacha20.fst#L181">this vintage example</a>
for a typical usage in HACL*: at call-site, KaRaMeL would simply replace any
function marked as <code class="language-plaintext highlighter-rouge">[@substitute]</code> with its body, substituting effective arguments
for formal parameters. This is simply an inlining step, also known as a
beta-reduction step. (The careful reader might notice that this is of course
unsound in the presence of effectful function arguments, which may be evaluated
twice: fortunately, F* guarantees that effectful arguments are let-bound, and
thus evaluated only once.)</p>

<p>This was the first attempt at performing partial evaluation at compile-time in
order to generate better code, and got us going for a while. However, it had
several drawbacks.</p>
<ul>
  <li>F* already had a normalizer, that is, an interpreter built into the type
inference machinery, for the purpose of higher-order unification. In other
words, F* could already reduce terms at compile-time, and owing to its usage
of a Krivine Abstract Machine (KAM), was much better equipped than KaRaMeL to
do so! The KAM is well-studied, reduces terms in De Bruijn form, and is
formally described, which is much better than adding an ad-hoc reduction
step somewhere as a KaRaMeL nanopass.</li>
  <li>F*’s normalizer performs a host of other reduction steps; if beta-reductions
happen only in KaRaMeL, then a large amount of optimizations are actually
impossible, because you need the composition of F*’s existing reduction
steps, and the desired extra beta-reductions, rather than one after the other.
In other words, you lose on how much partial evaluation you can trigger.</li>
</ul>

<p>To illustrate the latter point, consider the example below. Circa 2016, the
conditional could not be eliminated in the example below, because the
beta-reduction (in KaRaMeL) happened after the conditional elimination step (in
F*’s existing normalizer).</p>

<pre><code class="language-fstar">// St for "stateful"
[@substitute]
let f (x: bool): St ... =
  if x then
    ...
  else
    ...

let main () =
  f true
</code></pre>

<p>This was fixed in September 2017, when F* allowed its <code class="language-plaintext highlighter-rouge">inline_for_extraction</code>
keyword to apply to stateful beta-redexes. This suddenly “unlocked” many
more potential use-cases, which we quickly leveraged. The earlier example <a href="https://github.com/project-everest/hacl-star/blob/924ab395f8aa3cc8cd8a2281ba98eb99c36a947f/code/chacha20/Hacl.Impl.Chacha20.Core32.fst#L124">now
uses</a>
the new keyword.</p>

<h3 id="overloaded-operators-via-partial-evaluation">Overloaded Operators via Partial Evaluation</h3>

<p>In 2018, HACL* entered a deep refactoring that eventually culminated in all of
the original code being rewritten. That rewrite led to more concise,
reusable and effective code.</p>

<p>One of the key changes was overloaded operators. This is an interesting
technical feature, as it relies on the conjunction of four F* features:
compile-time reduction, implicit argument inference, fine-grained reduction
hints and cross-module inlining.</p>

<p>The problem is as follows. The Low* model of C in F* has one module per
integer type; effectively, we expose the base types found in <a href="https://en.cppreference.com/w/c/types/integer">C99’s
<code class="language-plaintext highlighter-rouge">&lt;inttypes.h&gt;</code></a>. Unfortunately,
this means each integer type (there are 9 of them) has its own set of operators.
This prevents modularity (functions need to be duplicated for e.g. <code class="language-plaintext highlighter-rouge">uint32</code> and
<code class="language-plaintext highlighter-rouge">uint64</code>), and hinders readability and programmer productivity.</p>

<p>Overloaded operators are defined <a href="https://github.com/project-everest/hacl-star/blob/master/lib/Lib.IntTypes.fsti#L826">as
follows</a>:</p>

<pre><code class="language-fstar">// IntTypes.fsti, Obviously, a simplification!
type int_width = | U32 | U64 | ...

inline_for_extraction
type int_type (x: int_width) =
  match x with
  | U32 -&gt; LowStar.UInt32.t
  | U64 -&gt; LowStar.UInt64.t

[@@ strict_on_arguments [0]]
inline_for_extraction
val (+): #w:inttype -&gt; int_type w -&gt; int_type w -&gt; int_type w
</code></pre>

<p>This interface file only introduces an abstract signature for <code class="language-plaintext highlighter-rouge">+</code>; the
definition is in the corresponding implementation file.</p>

<pre><code class="language-fstar">// IntTypes.fst
let (+) #w x y =
  match w with
  | U32 -&gt; LowStar.UInt32.add x y
  | U64 -&gt; LowStar.UInt64.add x y
</code></pre>

<p>Compared to our earlier <code class="language-plaintext highlighter-rouge">[@substitute]</code> example, this is a lot more
sophisticated! To make this work in practice, we need to combine several
mechanisms.</p>
<ul>
  <li>The <code class="language-plaintext highlighter-rouge">inline_for_extraction</code> qualifier indicates that this definition needs to be
evaluated and reduced at compile-time by F*. Indeed, the dependent type
<code class="language-plaintext highlighter-rouge">int_type</code> and the dependently-typed <code class="language-plaintext highlighter-rouge">+</code> do not compile to C, so any use of
these definitions must be evaluated away at compile-time.</li>
  <li>The <code class="language-plaintext highlighter-rouge">strict_on_arguments</code> attribute indicates that this definition should
<em>not</em> be reduced eagerly, as it would lead to combinatorial explosion in
certain cases. Rather, beta-reduction should take place only when the 0-th
argument to <code class="language-plaintext highlighter-rouge">+</code> is a constant, which guarantees the match reduces
immediately.</li>
  <li>The definition is a <code class="language-plaintext highlighter-rouge">val</code> in the interface, meaning it is abstract; this is
intentional, and prevents the SMT solver from “seeing” the body of the
definition in client modules, and attempting a futile case analysis. To make
this work, a special command-line flag must be passed to F*, called <code class="language-plaintext highlighter-rouge">--cmi</code>,
for cross-module inlining. It makes sure <code class="language-plaintext highlighter-rouge">inline_for_extraction</code> traverses
abstraction boundaries, but only at extraction-time, not verification-time.</li>
  <li>Finally, in order for this to be pleasant to use, we indicate to F* that it
ought to infer the width <code class="language-plaintext highlighter-rouge">w</code>, using <code class="language-plaintext highlighter-rouge">#</code> to denote an implicit argument.</li>
</ul>

<p>These abstractions form the foundation of HACL* “v2”. They are briefly described
in our <a href="https://eprint.iacr.org/2020/572">CCS Paper</a>.</p>

<p>(Note: this predates the type class mechanism in F*. The same effect would be
achievable with type classes, no doubt.)</p>

<h3 id="hand-written-templates-for-hashes">Hand-written Templates for Hashes</h3>

<p>Operators enable code-sharing in the small; we want to enable code-sharing in
the large. A prime example is the Merkle-Damgård (MD) family of hash functions
(MD5, SHA1, SHA2). All of these functions follow the <a href="https://github.com/project-everest/hacl-star/blob/d2a252358cadfa8e793d701616ba6d324ab90593/code/hash/Hacl.Hash.MD.fst#L267">same high-level
structure</a>:</p>

<pre><code class="language-fstar">// VASTLY simplified!
let hash (dst: array uint8) (input: array uint8) (input_len: size_t) =
  // Initialize internal hash state
  let state = init () in
  // Compute how many complete blocks are in input, relying on truncating
  // integer division, then feed them into the internal hash state.
  let complete_blocks = (input_len / block_size) * block_size;
  update_many_blocks state input complete_blocks;
  // Compute how many bytes are left in the last partial block. Use pointer
  // arithmetic, than process those leftover bytes as the "last block".
  let partial_block_len = input_len - complete_blocks;
  update_final_block state (input + complete_blocks) partial_block_len;
  // Computation is over: we "extract" the internal hash state into the final
  // digest dst.
  extract dst state
</code></pre>

<p>The <code class="language-plaintext highlighter-rouge">hash</code> function is generic (identical) across all hash algorithms;
furthermore, the <code class="language-plaintext highlighter-rouge">update_final_block</code> and <code class="language-plaintext highlighter-rouge">update_many_blocks</code> functions are
themselves generic, and only depend on the <em>block update function</em>
<code class="language-plaintext highlighter-rouge">update_block</code> which is specific to each hash in the family.</p>

<p>We could follow the trick we used earlier, and write something along the lines
of:</p>

<pre><code class="language-fstar">type alg = SHA2_256 | SHA2_512 | ...

let hash (a: alg) ... =
  let state = init a in
  update_many_blocks a state input; ...
</code></pre>

<p>Then, we would have to make sure none of the <code class="language-plaintext highlighter-rouge">a</code> parameters remain in the final
C code; this in turn would require us to inline all of the helper functions
such as <code class="language-plaintext highlighter-rouge">init</code>, <code class="language-plaintext highlighter-rouge">update_many_blocks</code>, etc. into <code class="language-plaintext highlighter-rouge">hash</code>, so as to be able to
write:</p>

<pre><code class="language-fstar">let hash_sha2_256 = hash SHA2_256
</code></pre>

<p>This works, and does produce first-order, low-level C code that contains no
run-time checks over the exact kind of MD algorithm; but this comes at the
expensive of readability: we end up with one huge, illegible <code class="language-plaintext highlighter-rouge">hash</code>
function, with no trace left of the <code class="language-plaintext highlighter-rouge">init/update/finish</code> structure at the heart
of the construction. (There are other issues with this approach, notably what
happens when you have multiple implementations of e.g. SHA2/256.)</p>

<p>We can do much better than that, and generate quality low-level code without
sacrificing readability.  The trick is to write <code class="language-plaintext highlighter-rouge">hash</code> as a higher-order
function, but perform enough partial evaluation that by extraction-time, the
higher-order control-flow is gone.</p>

<pre><code class="language-fstar">inline_for_extraction
let mk_hash 
  (init: unit -&gt; St internal_state ...)
  (update_many_blocks: internal_state -&gt; array uint8 -&gt; size_t -&gt; St unit ...)
  ...
  (dst: array uint8)
  (input: array uint8)
=
  // same code, but this time using the function "pointers" passed as parameters
</code></pre>

<p>This function is obviously not the kind of function that one wants to see in C!
It uses function pointers and would be wildly inefficient, not to mention that
its types would be inexpressible in C.</p>

<p>But one can partially apply <code class="language-plaintext highlighter-rouge">mk_hash</code> to its first few arguments:</p>

<pre><code class="language-fstar">let hash_sha2_256 = mk_hash init_sha2_256 update_many_blocks_sha2_256 ...
</code></pre>

<p>At compile-time, F* performs enough beta-reductions that
<code class="language-plaintext highlighter-rouge">hash_sha2_256</code> becomes a <em>specialized</em> hash function that calls
<code class="language-plaintext highlighter-rouge">init_sha2_256</code>, <code class="language-plaintext highlighter-rouge">update_many_blocks_sha2_256</code>, etc., thus producing legible,
compact code that calls specialized functions.</p>

<pre><code class="language-fstar">// After beta-reduction
let hash_sha2_256 (dst: array uint8) (input: array uint8) =
  // The structure of the function calls is preserved; we get intelligible code
  // as opposed to an unscrutable 500-line function body where everything has
  // been inlined.
  let state = init_sha2_256 () in
  update_many_blocks_sha2_256 state input;
  update_final_block_sha2_256 state (input - input % block_size);
  extract_sha2_256 dst state
</code></pre>

<p>The technique can be used
recursively:</p>

<pre><code class="language-fstar">let update_many_blocks_sha2_256 = mk_update_many_blocks update_one_block_sha2_256
</code></pre>

<p>In short, to add a new algorithm in the MD family, it simply suffices to write
and verify its <code class="language-plaintext highlighter-rouge">update_one_block_*</code> function. Then, instantiating the <code class="language-plaintext highlighter-rouge">mk_*</code>
family of functions yields a completely specialized copy of the code, that bears
no trace of the original higher-order, generic infrastructure.</p>

<p>This is akin to a template in C++, where <code class="language-plaintext highlighter-rouge">hash&lt;T&gt;</code> is specialized by the
compiler into monomorphic instances for various values of <code class="language-plaintext highlighter-rouge">T</code>, except here, we
don’t need any special support in the compiler! This is all done with
beta-reductions and partial applications.</p>

<p>This technique was developed, settled, and adopted circa 2019, and remains in
place to this day for hashes – it is briefly described in the <a href="https://eprint.iacr.org/2019/757">EverCrypt
Paper</a>.</p>

<h3 id="automated-templates">Automated Templates</h3>

<p>This technique works well for algorithms whose call-graph is not very deep.
Here, <code class="language-plaintext highlighter-rouge">hash</code> calls into <code class="language-plaintext highlighter-rouge">update_many_blocks</code>, which itself calls into
<code class="language-plaintext highlighter-rouge">update_one_block</code> – there are no further levels of specialized calls. But for
much larger pieces of code, manually restructuring many source files with a very deep
call graph can prove extraordinarily tedious. No one wants to manually add all
of those function parameters!</p>

<p>The solution is simply to perform this rewriting automatically with Meta-F*.
Meta-F* allows the user to write regular F* code and have it be executed at
compile-time; F* exposes its internals via a set of safe-by-construction APIs,
which in turn allows meta-programs to inspect definitions, perform proofs,
generate new definitions, and much more. This is the technique known as
elaborator reflection and pioneered by Lean and Idris.</p>

<p>In our case, the user writes their code “normally”, and adds attributes to
indicate which functions need to be rewritten into the <code class="language-plaintext highlighter-rouge">mk_*</code> form. Then, a
generic meta-program (written once) traverses the function graph, inspects
definitions, rewrites them using the <code class="language-plaintext highlighter-rouge">mk_*</code> pattern, and inserts the
higher-order-style functions into the program. This all happens automatically
without user intervention. All that is left for the user is to “instantiate” the
template for their particular choice of base functions (e.g.,
<code class="language-plaintext highlighter-rouge">let update_many_blocks_sha2_256 = mk_update_many_blocks update_one_block_sha2_256</code>).</p>

<p>The tactic was developed throughout late 2019, and was adopted at scale in early 2020.
At the time of this blog post, this “higher-order rewriting tactic”
remains the second
largest Meta-F* program ever written; most algorithms in HACL now
rely on the tactic. We use it to “instantiate” an algorithm with a choice of
<em>primitives</em> (e.g. HPKE with ChachaPoly+Curve25519+SHA256), a choice of
<em>implementations</em> (e.g. ChachaPoly with Chacha-AVX2+Poly-AVX2), or both. We
describe this technique in great detail in <a href="https://arxiv.org/abs/2102.01644">a
pre-print</a>.</p>

<h3 id="futamura-projection-with-recursion-on-the-normalizer">Futamura projection with Recursion on the Normalizer</h3>

<p>The final and most complex example in our partial evaluation journey is Noise*,
a protocol compiler for the <a href="https://noiseprotocol.org/noise.html">Noise Protocol
Framework</a>. Briefly, Noise is a DSL
(domain-specific language) that describes a series of key establishment
protocols, whose purpose is to establish a shared secret, hence a secure
communication channel, between two parties, known as the initiator and the
responder. A typical Noise program looks as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>IKpsk2:
 &lt;- s
 ...
 -&gt; e, es, s, ss
 &lt;- e, ee, se, psk
</code></pre></div></div>

<p>Arrows indicate the flow of data: <code class="language-plaintext highlighter-rouge">-&gt;</code> flows from initiator to responder, and
conversely. The first line, before the <code class="language-plaintext highlighter-rouge">...</code>, indicates data that is available
out-of-band: in this case, the initiator knows the server’s static (<code class="language-plaintext highlighter-rouge">s</code>) key.</p>

<p>The actual handshake happens below the <code class="language-plaintext highlighter-rouge">...</code>. Each token indicates a particular
cryptographic operation: <code class="language-plaintext highlighter-rouge">es</code>, for instance, indicates a Diffie-Hellman (DH)
operation between the initiator’s ephemeral key and the responder’s static key;
furthermore, it is implicitly understood that any further communication will be
encrypted using a key derived from the DH secret.</p>

<p>There are 59 protocols in the Noise family; naturally, we want to write and
verify an implementation only once! This new challenge is slightly more
complicated than what we saw previously with the hashes: a line is a list of
tokens, and a handshake is a list of list of tokens. We thus need to operate
over recursive data structures at compile-time – our code better terminate!</p>

<p>We represent programs in the Noise DSL as a <em>deep embedding</em>:</p>

<pre><code class="language-fstar">type token = E | ES | S | SS
type step = list token
type handshake = list step
</code></pre>

<p>We write our code in what we call a “hybrid embedding” style. Parts of our code
operate over the deep-embedding, recurse over the steps of the handshake, and
maintain compile-time data, such as “this is the <code class="language-plaintext highlighter-rouge">i</code>-th step of the handshake”.
Other parts use the regular Low* shallow embedding and perform the actual
run-time operations.</p>

<p>The part of the code that operates over the deep embedding executes at
compile-time, and thus belongs to the first stage. After the first stage has
executed, all that remains is code for the second stage, using the Low* shallow
embedding, hence the name “hybrid embedding”.</p>

<p>Concretely, we write an interpreter for Noise programs, whose actions are in
Low*:</p>

<pre><code class="language-fstar">let rec eval_token (t: token) (s: state) =
  match t with
  | ES -&gt;
      // This is the Low* call and appears in the generated code. The
      // surroundings (let rec, match) are "compiler code" and reduce away at
      // compile-time.
      diffie_hellman s.encryption_key s.ephemeral s.remote_static
  | ...

and eval_step (tokens: step) (s: state) =
  match tokens with
  | t :: tokens -&gt;
      eval_token t s;
      eval_step tokens s
  | [] -&gt;
      ()

and ...
</code></pre>

<p>Thanks to the <a href="https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_projections">first Futamura
projection</a>,
we can partially apply the <code class="language-plaintext highlighter-rouge">eval_*</code> series of functions to one specific program
in the Noise* DSL. In the case of the first step of <code class="language-plaintext highlighter-rouge">IKpsk2</code>, we obtain:</p>

<pre><code class="language-fstar">eval_step [ E; ES S; SS ] s

~&gt; // reduces to

eval_token E s;
eval_step [ ES; S; SS ] s

~&gt; // reduces to

eval_token E s;
eval_token ES s;
eval_step [ S; SS ] s

~&gt; // reduces to

generate_ephemeral s.ephemeral;
diffie_hellman s.encryption_key s.ephemeral s.remote_static
...
</code></pre>

<p>In other words, we have embedded a protocol compiler in F*’s normalizer; to
compile a Noise program, it suffices to perform a partial application and the
compiler runs at compile-time in F*. The compiler produces a complete protocol
stack, including state machine, defensive API, data structures for managing the
peer list, and state serialization/deserialization, fully specialized for the
given Noise protocol.</p>

<p>This technique allows for a fair degree of sophistication; we have
expression-level compiler steps (the <code class="language-plaintext highlighter-rouge">match</code>es) above; but we also have
type-level reduction steps: for instance, some fields of the state reduce to
<code class="language-plaintext highlighter-rouge">unit</code> when not needed for the chosen Noise protocol, which in turn guarantees
that they won’t appear in the resulting C code.</p>

<p>Naturally, making this work in practice requires a high degree of familiarity
with F*’s internals; and I would lie if I said that we never pulled our hair
trying to debug a normalizer loop! But the end result is conceptually quite
satisfying, and thanks to Son Ho’s incredible work, yielded a very exciting
<a href="https://eprint.iacr.org/2022/607">paper</a> that he will present this week at Oakland.</p>

<h3 id="looking-back">Looking back…</h3>

<p>Verifying code is <em>hard</em>, labor-intensive, and exhausting. Any technique that
makes the task easier yields substantial gains of productivity (and student
happiness). Opportunities abound for leveraging PL techniques to automate the
<em>production</em> of verified code! Instead of directly writing Low* code, it is
oftentimes much more efficient to write code that produces Low* code. The extra
degree of indirection pays off; and the most dramatic gains are achieved when
the team’s expertise combines cryptography and PL.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[For the past five years or so, I have been exploring, along with many other collaborators, the space of verified cryptography and protocols. The premise is simple: to deliver critical code that comes with strong properties of correctness, safety and security. Some readers may be familiar with Project Everest: this is the umbrella under which a large chunk of this work happened.]]></summary></entry><entry><title type="html">What’s new in Everest: Summer 2020</title><link href="http://jonathan.protzenko.fr/2020/08/13/whats-new-in-everest.html" rel="alternate" type="text/html" title="What’s new in Everest: Summer 2020" /><published>2020-08-13T12:07:00-07:00</published><updated>2020-08-13T12:07:00-07:00</updated><id>http://jonathan.protzenko.fr/2020/08/13/whats-new-in-everest</id><content type="html" xml:base="http://jonathan.protzenko.fr/2020/08/13/whats-new-in-everest.html"><![CDATA[<p>In a valiant attempt to wean myself off of
<a href="https://www.wired.com/story/stop-doomscrolling/">doomscrolling</a>, I thought I’d
try to write a few blog posts this summer. This one highlights some of the
exciting things that happened over the past few months in
<a href="https://project-everest.github.io/">Everest</a>, and
specifically around the <a href="https://hacl-star.github.io/">HACL and EverCrypt projects</a>.</p>

<h3 id="new-bindings-for-everest-cryptography">New bindings for Everest cryptography</h3>

<p>The big piece of news is that we now have official OCaml and JavaScript bindings
for our cryptographic code, a long-standing request from many consumers.</p>

<ul>
  <li>
    <p>The OCaml bindings were authored by Victor Dumitrescu. Thanks to Victor’s
patches, KreMLin, in addition to generating C code from F*, now also
<a href="https://github.com/FStarLang/kremlin/blob/master/src/GenCtypes.ml">generates matching ocaml-ctypes
bindings</a>
for the generated C code. The resulting OCaml package is called hacl-star-raw.
The API in hacl-star-raw is very low-level and does not enforce any of the
preconditions that are present in the F*
source code. To make things much more pleasant to use, Victor also authored a
<a href="https://github.com/project-everest/hacl-star/tree/master/bindings/ocaml">separate
package</a>
that wraps hacl-star-raw with a set of very idiomatic bindings that use
functors, nice high-level signatures and types. This latter package is called
simply <code class="language-plaintext highlighter-rouge">hacl-star</code> and can be installed with <code class="language-plaintext highlighter-rouge">opam install hacl-star</code>.</p>

    <p>One of the very nice things about calling HACL or EverCrypt from OCaml is that
all of the static preconditions can be checked at run-time! The functions
that HACL* or EverCrypt expose to clients have preconditions of the form
<code class="language-plaintext highlighter-rouge">disjoint x y</code>, where <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are arrays, or preconditions of the form
<code class="language-plaintext highlighter-rouge">length x = l</code>. Because we bind arrays as <code class="language-plaintext highlighter-rouge">bytes</code> in OCaml, the former check
boils down to <code class="language-plaintext highlighter-rouge">==</code> and the latter check boils down to <code class="language-plaintext highlighter-rouge">Bytes.length</code>. So, your
OCaml program will not segfault if you misuse the API – it is completely
safe.</p>
  </li>
  <li>
    <p>The JavaScript bindings build on our work on WebAssembly, and were authored by
Denis Merigoux. Last year, we published the <a href="https://ieeexplore.ieee.org/document/8835291">first fully formalized compilation
toolchain</a> from F* to
WebAssembly at IEEE S&amp;P’19. In that work, we presented (among other things) a
paper formalization of the Low* to WASM compilation, along with its
implementation in KreMLin, and its application to HACL*. This gave us
WHACL*, the compilation of HACL* to WASM via the formalized toolchain in the
KreMLin compiler.</p>

    <p>Of course, there’s quite a gap between writing a paper and finishing the
packaging, integrating the work under CI, fixing the long tail of small bugs
that prevent a smooth integration, etc. So, it wasn’t until a few months
ago that we were able to finally declare victory and publish WHACL*.</p>

    <p>Just like for OCaml, the JavaScript bindings are made up of two layers. The
lower layer is “raw” WASM code as output by the KreMLin compiler. Using this
code requires knowing not only the static preconditions that must be satisfied
by the clients, but also being aware of the KreMLin calling-convention, i.e.
how KreMLin-compiled WASM code expects JavaScript clients to lay out arguments
in memory before jumping into the WASM code.</p>

    <p>To make things easier for clients, Denis wrote a JavaScript wrapper that
hides all of these complexities and offers an idiomatic, “native” JavaScript
API based on ArrayBuffers. This has been published in the node package
repository and can be installed with <code class="language-plaintext highlighter-rouge">npm install hacl-wasm</code>.</p>

    <p>One of the highlights of our JavaScript package is that it now makes it easy
to use verified cryptography on the web, on the desktop (e.g. Electron) or on
the server (via node). You no longer need to wait for WebCrypto to include
the newer algorithms!</p>
  </li>
</ul>

<p>The documentation for both of these packages is
<a href="https://hacl-star.github.io/Obtaining.html#bindings-for-other-languages">online</a>.</p>

<h3 id="much-improved-packaging-and-distribution">Much improved packaging and distribution</h3>

<p>Over the past few months, we’ve steadily made great progress ensuring our code
can be consumed easily by anyone interested. This required work in many areas:</p>

<ul>
  <li>
    <p>The generated C code is now under version control on the master branch and
refreshed automatically every night (contribution by Franziskus Kiefer). This
means that you can always get the latest and freshest code!</p>
  </li>
  <li>
    <p>There is now a configure script that performs some amount of auto-detection
for your toolchain. This was important to enable compilation on Arm, where
some files just flat out don’t compile, because, say, they use 256-bit wide
vector instructions, for which there exists no ARM implementation.</p>
  </li>
  <li>
    <p>We’ve signed up for Travis CI to try compiling the generated C code. We are
now testing six different configurations: Linux/ARM64, Linux/AMD64 (4
variants), Windows/AMD64. This was motivated by the wonderful and humbling
cornocupia of compiler and toolchain bugs we found. Among the most delightful
issues we had:</p>
    <ul>
      <li>a version of XCode refused to compile our inline assembly because its
register allocator bailed; no one was able to reproduce it, not even Apple,
so it looks like Travis is using the one exact build of XCode with the
problem, and this build cannot be found anywhere else!</li>
      <li>GCC swapped the order of arguments of an intrinsic at some point throughout
its lifetime, but clang, which also defines <code class="language-plaintext highlighter-rouge">__GNUC__</code>, always had the right
order to begin with</li>
      <li>after fixing the order of arguments, <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81300">GCC still miscompiled the intrinsic</a></li>
      <li>no single toolchain agrees on a uniform way to securely zero-out memory</li>
      <li>compiler intrinsics are not found in the same header in MSVC vs. the rest of
the world</li>
      <li>and so many more…</li>
    </ul>
  </li>
</ul>

<p>As always, this is the most unrewarding kind of work: anyone would rather be
doing cool proofs than battle with Travis and debug toolchain issues. But, it’s
my firm belief that high-quality packaging is essential to ensure the success of
Everest crypto. So, extra thanks to Victor Dumitrescu, Natalia Kulatova,
Marina Polubelova and Santiago Zanella-Béguelin for their help debugging and
nailing down some of these most vexing issues.</p>

<h3 id="new-incremental-apis">New incremental APIs</h3>

<p>Of course, many things happened on the code side, with new algorithms, new
vectorized APIs, and an increased usage of meta-programming that has
significantly lowered our code-to-proof ratio. All of this is covered in <a href="https://eprint.iacr.org/2020/572">our
ePrint</a>, submitted with amazing collaborators
Marina Polubelova, Karthikeyan Bhargavan, Benjamin Beurdouche, Aymeric Fromherz,
Natalia Kulatova and Santiago Zanella-Béguelin.</p>

<p>But rather than enumerate a long list of improvements, I’ll just focus on one
piece of work that I’m particularly excited about.</p>

<p>Many cryptographic primitives fall within a given family of algorithms that
share some common characteristics. For instance, Merkle-Damgard hashes, Poly1305
and Blake2 are all block algorithms. This means that they process exactly one
block of data at a time; clients who can’t provide data block-by-block must
perform buffering themselves, a classic source of bugs owing to
modulo-computations. Furthermore, block algorithms require clients to follow a
very precise state machine, with a sequence of functions oftentimes referred to
as <code class="language-plaintext highlighter-rouge">init</code>/<code class="language-plaintext highlighter-rouge">update_block</code>/<code class="language-plaintext highlighter-rouge">update_last</code>/<code class="language-plaintext highlighter-rouge">finish</code> – it is easy for clients to get this wrong.
Finally, there are very subtle but essential differences in the state machines;
for instance, clients may never call Blake2’s <code class="language-plaintext highlighter-rouge">update_last</code> with an empty block,
even though it’s ok to do so for SHA2.</p>

<p>In short, <code class="language-plaintext highlighter-rouge">init</code>/<code class="language-plaintext highlighter-rouge">update_block</code>/<code class="language-plaintext highlighter-rouge">update_last</code>/<code class="language-plaintext highlighter-rouge">finish</code> is a very error-prone, low-level API
and clients are probably better off using higher-level APIs that take care of
the buffering, state machine management, and abstract away the idiosyncrasies
of each algorithm.</p>

<p>Naturally, this is where formal verification comes in. We decided to start
looking higher up the stack, beyond bare cryptographic primitives. In our latest
work, we tackle the verification of the high-level APIs that perform internal
buffer management and rule out state machine errors by copying internal block
state as needed.</p>

<p>We have over a dozen implementations in our tree that are eligible for a
high-level API. Writing a copy of the high-level API for each would be bad
engineering, a poor use of our time, and of course, not very much fun.</p>

<p>Instead, we wrote a functor that takes as an argument a block-based algorithm,
and generates via meta-programming a Low* implementation of its corresponding
high-level API. The high-level API has a trivial state machine that admits no
user errors; clients can feed the data byte-by-byte thanks to internal
buffering; and by virtue of being the application of a single functor, all
high-level APIs are meant to be used in the exact same fashion, where unpleasant
state machine differences have all been abstracted away.</p>

<p>Thanks to a judicious use of meta-programming and very fine-grained control of
meta-time partial evaluation, the resulting code has no cruft. The functor has
several tweaking knobs, controlling for instance whether the resulting Low* code
will have to perform key management (e.g. Poly1305), or whether this is not
needed (e.g. SHA2). In the latter case, the relevant struct fields and function
parameters get eliminated via partial evaluation.</p>

<p>These new high-level APIs are now all <a href="https://github.com/project-everest/hacl-star/blob/master/dist/gcc-compatible/Hacl_Streaming_SHA2_256.h">available on
master</a>,
and are known as the “Streaming” APIs. Thanks to Aymeric Fromherz and heroic
work by Son Ho, Blake2 has been brought under the same API, meaning we can now
trivially generate a byte-by-byte API for any implementation of SHA, Poly1305 or
Blake2.</p>

<p>We plan to add SHA3 to the set of available streaming APIs, and also adopt the
same approach for other families of algorithms, such as CTR encryption. And, of
course, write a paper about this all. Stay tuned!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[In a valiant attempt to wean myself off of doomscrolling, I thought I’d try to write a few blog posts this summer. This one highlights some of the exciting things that happened over the past few months in Everest, and specifically around the HACL and EverCrypt projects.]]></summary></entry><entry><title type="html">GitHub strange</title><link href="http://jonathan.protzenko.fr/2019/12/08/github-investigations.html" rel="alternate" type="text/html" title="GitHub strange" /><published>2019-12-08T09:00:00-08:00</published><updated>2019-12-08T09:00:00-08:00</updated><id>http://jonathan.protzenko.fr/2019/12/08/github-investigations</id><content type="html" xml:base="http://jonathan.protzenko.fr/2019/12/08/github-investigations.html"><![CDATA[<p>Earlier this year, I was paired up with my colleagues
<a href="https://www.microsoft.com/en-us/research/people/cbird/">Chris</a>,
<a href="https://www.microsoft.com/en-us/research/people/tzimmer/">Tom</a>,
<a href="https://www.microsoft.com/en-us/research/people/madanm/">Madan</a> and intern
Danielle Gonzalez (from Rochester) for the annual Microsoft Hackathon. The goal
was to examine and extract meaningful data from
<a href="http://ghtorrent.org/">GHTorrent</a>, a colossal data set that contains an
exhaustive log of all GitHub events, for all repositories.</p>

<p>Nothing went quite as planned, and we (of course) hit more issues than we
anticipated, but in the short span of a couple days we still managed to make
some interesting discoveries. In particular, we found some really weird GitHub
repositories that to this day raise more questions than answers… Here’s
(finally) a writeup!</p>

<h2 id="a-monstrous-dataset">A monstrous dataset</h2>

<p>The dataset was colossal: even on a fine, 20MB/s network
connection, merely <em>downloading</em> 100GB of data still takes about two hours…
and that’s just for a month’s worth of data. So, there went the first few hours
of the hackathon.</p>

<p>The next issue was simply that my machine, even with 1TB of disk, just didn’t
have enough space to load the downloaded SQL dump into a local MySQL database;
even if I had had the space, according to our extrapolations, this would’ve
taken several days. So, we had to come up with a plan B.</p>

<p>A quick language shootout ensued, where we each wrote a quick-and-dirty script
that would parse a couple of the SQL dumped tables (in CSV format) and try to
compute some basic queries by hand. Perl, Unix shell (grep &amp; co), and OCaml
were all attempted. We found that using the excellent
<a href="https://github.com/Chris00/ocaml-csv">CSV</a> package for OCaml gave the fastest
results (much more so than a hand-writter lexer); they even had the exact option
we needed to parse the specific escaping format used by SQL dumps.</p>

<h2 id="the-strange-world-of-github-repositories">The strange world of GitHub repositories</h2>

<p>We then attempted to answer a very simple question in our remaining time: <strong>which
repositories have the most commits</strong>? (Over the course of that month.)</p>

<p>It turns out that because we were aggregating and folding over the entire
event stream of GitHub, instead of counting the number of commits currently in the
tree, we appear to have counted the number of commits <em>pushed to a given
repository over the chosen time period</em>. The results were not what we expected, and
uncovered some repositories that are not what you can read about, e.g. on this <a href="https://www.quora.com/Which-GitHub-repo-has-the-most-commits">Quora
question</a>.</p>

<h3 id="first-place-tmp_clock_repo">First place: <code class="language-plaintext highlighter-rouge">tmp_clock_repo</code></h3>

<p>We initially suspected an error in our script: the first repository on our list
was <a href="https://github.com/efarbereger/tmp_clock_repo">https://github.com/efarbereger/tmp_clock_repo</a>, with over <strong>13 million
commits</strong>. We immediately went to the project page, only to find a nondescript
repository, with no files in it, only 1470 commits, the latest of which was
several days ago. Only after we navigated <a href="https://github.com/efarbereger">back up to the
author’s page</a> did everything suddenly became
clear.  Eric Farber-Eger, an unsung hero of version control, has a cron job that
every five minutes pushes an entire new history to his repo, crafted so that the
GitHub heat map of his contributions forms a digital LCD clock. And, <code class="language-plaintext highlighter-rouge">31 * 24 *
( 60 / 5 ) * 1470</code> is about 13 million, so this checks out.</p>

<p><img src="/misc/clock.png" alt="the aforementioned clock" /></p>

<p>A quick aside: Git allows one to entirely rewrite the history by force-pushing,
and commit metadata is only indicative; in particular, the date of a given
commit can be chosen arbitrarly, either by using a Git library directly (e.g.
<code class="language-plaintext highlighter-rouge">libgit2</code>) or via the <code class="language-plaintext highlighter-rouge">--date</code> option of the command-line frontend. Thus, the
heatmap can be used as a virtual LCD display where each pixel is addressable by
writing to the (fresh) history commits for that given time period.</p>

<p>I don’t know who Eric is; a Google search doesn’t seem to yield many results;
but he has my eternal admiration.</p>

<p>For interested readers, there seems to be an
<a href="https://github.com/gelstudios/gitfiti">entire</a>
<a href="https://github.com/bayandin/github-board">set</a> of
<a href="https://codepen.io/sebdeckers/pen/vOXeKV">libraries</a> dedicated to the very task
of pushing pixel art on GitHub heatmaps. Some people’s creativity is just
astonishing.</p>

<h3 id="second-place-historyclockimage">Second place: <code class="language-plaintext highlighter-rouge">historyclockimage</code></h3>

<p>In second place was a now-defunct project called <code class="language-plaintext highlighter-rouge">historyclockimage</code>, at nearly
5 million commits pushed. Quite unsurprisingly, it was under username
<code class="language-plaintext highlighter-rouge">efarbereger</code>. My curiosity and admiration for this mystery man only grew
stronger.</p>

<h3 id="third-place-blocklist-ipsets">Third place: <code class="language-plaintext highlighter-rouge">blocklist-ipsets</code></h3>

<p>In third place was <a href="https://github.com/firehol/blocklist-ipsets">https://github.com/firehol/blocklist-ipsets</a>, with 3.5
million commits pushed over time, but only one commit in the history. It turns
out that, this is just one instance of people using GitHub as a cloud storage
provider, to store a variety of files which, conceivably, can be easily updated
by clients via the use of a simple Git pull.</p>

<p>I guess there is something to be said for the simplicity of the Git workflow? Or
is it that setting up just storage on the cloud is too much of a setup for
trivial use-cases?</p>

<h3 id="fourth-place-heartbeat">Fourth place: <code class="language-plaintext highlighter-rouge">heartbeat</code></h3>

<p>In fourth place, with 1.6 million commits, was perhaps the most intringuing
repository: <a href="https://github.com/19h/heartbeat">https://github.com/19h/heartbeat</a>. The description reads:
“Emergency signed life insurance files.”. The contents? Three files: a cryptic
README, whose first line is <code class="language-plaintext highlighter-rouge">GCM R 20/0c/400 L 20/0c/400</code> followed by some
base64 data, which once decoded, does not seem to have any structure or
contents. Beyond the README are two files, <code class="language-plaintext highlighter-rouge">lkLocation</code> and <code class="language-plaintext highlighter-rouge">lkHeartbeat</code>.
I could find very little on this repository, except for a brief <a href="https://www.reddit.com/r/coding/comments/4rbsu8/live_github_commit_messages/d4zz81i/">Reddit
thread</a>.</p>

<p>What is this mysterious project? Is this a cloud-distributed, modern version of
the supposed dead-hand radio <a href="https://en.wikipedia.org/wiki/UVB-76">UVB-76</a>?</p>

<h3 id="fifth-place-update">Fifth place: <code class="language-plaintext highlighter-rouge">update</code></h3>

<p>In fifth place, with 1.2 million commits:
<a href="https://github.com/shenzhouzd/update">https://github.com/shenzhouzd/update</a>, which very simply reads: “This
repository has been disabled.  Access to this repository has been disabled by
GitHub staff due to excessive use of resources”. What have you done,
<code class="language-plaintext highlighter-rouge">shenzhouzd</code>? Why is <a href="https://github.com/shenzhouzd/update1">https://github.com/shenzhouzd/update1</a> empty?</p>

<h3 id="special-mention-ci-logs">Special mention: CI logs</h3>

<p>Some familiar faces appear further down the line (positions 16 and 52), with
<a href="https://github.com/avsm/mirage-ci-logs">CI</a>
<a href="https://github.com/avsm/ocaml-ci.logs">logs</a> for the impressive infrastructure
deployed by our friends at Cambridge Labs.</p>

<h3 id="very-special-mention-tv-playlists">Very special mention: TV playlists?!!</h3>

<p>Further down were <a href="https://github.com/tumhopaasmere/tumhopaasmere">a</a>
<a href="https://github.com/jobhimain/jobhimain">set</a>
<a href="https://github.com/kunfayakun/kunfayakun">of</a>
<a href="https://github.com/triforcecoin/triforcecoin">repositories</a> all sharing the same characteristics:
a single user, with a single repository of the same name, containing a single
file: <code class="language-plaintext highlighter-rouge">lists/plex.txt</code>. The file appears to be a curated playlist of video
channels from across the world; radio stations; and a weird mix of movies hosted
on a public website. Is there some hotel, somewhere, in some part of the world,
where the VOD system pulls its data from GitHub? Who has curated this list and
decided that all the Matrix and Home Alone sequels should be included?</p>

<p><strong>Update:</strong> my colleague Tom points out that this seems to be related to
<a href="http://ccloudtv.org/">http://ccloudtv.org/</a>. I’m unsure why there are many distributed playlists on
remote GitHub repositories. Probably for the convenient cloud storage, once again?</p>

<h2 id="methodology--conclusion">Methodology &amp; conclusion</h2>

<p>Looking back on the data, this only a very partial view; after all, it’s only
over a given month of a given year. It says nothing about the “overall”
importance of a given repository; Linux, for instance, is way down.</p>

<p>The goal, however, was not to have super-serious results anyhow, but just to
experiment over a short span of time. I’m glad we found oddities
and quirky repositories! There are many more mysteries in this list, which I’ve
<a href="/misc/commits_per_project.csv">uploaded online</a>. I’d be happy to hear readers’
theories on the mysterious playlists and the life insurance policy.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Earlier this year, I was paired up with my colleagues Chris, Tom, Madan and intern Danielle Gonzalez (from Rochester) for the annual Microsoft Hackathon. The goal was to examine and extract meaningful data from GHTorrent, a colossal data set that contains an exhaustive log of all GitHub events, for all repositories.]]></summary></entry><entry><title type="html">The EverCrypt verified cryptographic provider</title><link href="http://jonathan.protzenko.fr/2019/04/02/evercrypt-alpha1.html" rel="alternate" type="text/html" title="The EverCrypt verified cryptographic provider" /><published>2019-04-02T10:00:00-07:00</published><updated>2019-04-02T10:00:00-07:00</updated><id>http://jonathan.protzenko.fr/2019/04/02/evercrypt-alpha1</id><content type="html" xml:base="http://jonathan.protzenko.fr/2019/04/02/evercrypt-alpha1.html"><![CDATA[<p>Today, we’re announcing a preview release of
<a href="https://github.com/project-everest/hacl-star/releases/tag/evercrypt-v0.1alpha1">EverCrypt</a>, a
verified cryptographic provider that offers a comprehensive collection of
cryptographic algorithms.  EverCrypt automatically selects the best available
implementation for your platform (C or assembly); offers unified APIs for
families of algorithms (e.g. hashes); and its performance is on-par with what
you’d find in, say, OpenSSL. In short, EverCrypt aims to offer a verified
cryptographic library that offers the same convenience as existing,
industrial-grade libraries, but with added verification guarantees.</p>

<p>A verified cryptographic library is of crucial importance: cryptographic
libraries are found in any modern software stack, and they’re incredibly hard to
get right. Bugs range from memory corruption, incorrect math, to side-channel
leaks and even illegal instruction errors. All of these can have incredibly
painful consequences. With Evercrypt, we hope to demonstrate that one can rule
out these errors with mathematical certainty, without compromising on the
feature set or performance.</p>

<p>The technical details, including a precise description of what we verify, are
available in the
<a href="https://github.com/project-everest/hacl-star/blob/fstar-master/README.EverCrypt.md">README</a>.
The high-level overview, with a human-readable explanation of what we hope to
achieve, is on the <a href="https://www.microsoft.com/en-us/research/blog/evercrypt-cryptographic-provider-offers-developers-greater-security-assurances/">MSR blog</a>.
For a more general perspective on software
verification, <a href="https://www.quantamagazine.org/how-the-evercrypt-library-creates-hacker-proof-cryptography-20190402/">Quanta
Magazine</a>
just released a very accessible article that talks about our work.</p>

<h2 id="what-were-hoping-for-with-this-alpha-release">What we’re hoping for with this alpha release</h2>

<p>First, an obligatory disclaimer: this is an alpha release, and important
features are missing, such as tests for non-X64 platforms; a C fallback
implementation of the AES algorithms; and many others I’m sure. We also have a
few admitted proofs here and there throughout our code which couldn’t be wrapped
up in time for the release. These will be addressed by the end of this release
cycle.</p>

<p>Nevertheless, the reason we’re doing an “informal” release is to gather feedback
about <a href="https://github.com/project-everest/hacl-star/tree/evercrypt-v0.1%2B/dist">the
code</a>,
even in its current state. Things we’d love to hear about include:</p>
<ul>
  <li>ease-of-use of our library from C</li>
  <li>whether the APIs are “idiomatic” or they could use some improvements</li>
  <li>integration or build difficulties</li>
  <li>things that could be improved to make this even more useful to C clients</li>
  <li>most-wanted algorithms / optimized implementations (do you crave an AVX2
SHA256 or an AVX Chacha20?).</li>
</ul>

<p>In terms of APIs, the most polished one is the hash API (<code class="language-plaintext highlighter-rouge">EverCrypt_Hash.h</code>);
let us know if there are improvements to be enacted for this style.</p>

<p>Some of the improvements on the radar are: a unified error code in
<code class="language-plaintext highlighter-rouge">EverCrypt.Error</code>; abstract C structs for all APIs, including Hash and AEAD; and
<a href="https://github.com/project-everest/hacl-star/issues/145">many more</a>.</p>

<p>Please get in touch via a <a href="https://github.com/project-everest/hacl-star/issues">GitHub
issue</a>! If sharing an
experience report publicly is not possible, private emails also work.</p>

<h2 id="a-look-behind-the-scenes">A look behind the scenes</h2>

<p>We call EverCrypt a “provider” because it unifies under a single interface two
strands of work from <a href="http://project-everest.github.io/">Project Everest</a>:
HACL*, a cryptographic library written in Low* which generates pure C
algorithms, and Vale-Crypto, a collection of assembly algorithms written in
Vale.</p>

<p>EverCrypt is, perhaps, the first project to come out of Everest whose ownership
is truly spread across all institutions currently working on Everest: Microsoft
Research, INRIA, Carnegie Mellon University. It means it’s also the first
instance in which we had to reconcile two independently-developed projects.</p>

<p>There were plenty of technical challenges which we will cover in an upcoming
paper submission: verifying both the Vale and HACL implementations against the
same specifications; crafting suitable interfaces to abstract away the
implementation details; doing multiplexing in a style that is friendly to the C
preprocessor; and many more.</p>

<p>But the challenge, perhaps, wasn’t so much in the verification itself, but in
<em>materializing</em> the union of the two projects; and for that, a lot of work
took place behind the scenes. We have, for the whole hacl-star repository,
110,000 lines of hand-written Low* and 25,000 lines of hand-written Vale;
the latter translate to 70,000 lines of (generated) F*. Getting those in a
single repository, efficiently verifying and building, was in itself a
challenge. Several contributor-weeks were devoted to a unified build system;
performance improvements at every level in F*; bringing back diverging
abstractions into shared library modules that could guarantee
safe interoperability; and many more mundane tasks that we’d all rather forget.</p>

<p>The work is still very much in progress and we have a long road towards a 1.0
release, but everyone has been admirably patient in the face of engineering and
performance challenges. My hope is that as the technology stabilizes, we’ll be
able to allocate more time to documentation and learning materials, and bring
onboard more contributors who will help us grow our body of verified code.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Today, we’re announcing a preview release of EverCrypt, a verified cryptographic provider that offers a comprehensive collection of cryptographic algorithms. EverCrypt automatically selects the best available implementation for your platform (C or assembly); offers unified APIs for families of algorithms (e.g. hashes); and its performance is on-par with what you’d find in, say, OpenSSL. In short, EverCrypt aims to offer a verified cryptographic library that offers the same convenience as existing, industrial-grade libraries, but with added verification guarantees.]]></summary></entry></feed>