A case study with Aeneas and jxl-rs
Aeneas is a toolchain for verifying Rust programs, relying on a functional translation to Lean. Aeneas is now nearly five years old, and many excellent improvements have landed recently, thanks to the work of Son Ho, Aymeric Fromherz, Guillaume Boisseau and many other collaborators. With this blog post, I’m hoping to showcase how proofs in Aeneas look like these days, and highlight recent improvements, by relying on a sample problem given to me by my colleague Luca Versari. I think this blog post can also serve as a nice example / tutorial on how to work with Aeneas on a real-world piece of code.
Before going any further: more info about Aeneas is available on GitHub, and most communication happens on our Zulip – subscribe to the newsletter channel there if you’d like to stay updated. Finally, the science is in our ICFP’22 and ICFP’24 papers.
Background: jxl-rs
jxl-rs is a rewrite of the jpeg-xl codec, in Rust. For performance reasons,
parts of the rewrite do rely on unsafe code. In this blog post, our goal is to
show that this particular
unchecked_get
is safe, that is, that the index is always within bounds.
Aeneas does not have much support for unsafe code; but unchecked accesses
such as the one above are modeled like regular accesses: one must show that the
index is within bounds in order to show that the function terminates with a
proper result (the .ok case in Aeneas). In other words, if we can prove that
read produces an .ok, then we have shown that that the array access is
always within bounds.
Running Aeneas
I packaged the proofs here as a toy
project. To run Aeneas, we must first produce an LLBC file, that is, a cleaned
up version of the MIR representation of the Rust compiler. This is done
by invoking
the Charon tool, passing it a --start-from flag to extract read (our target
function), its
transitive dependencies, and nothing else. See our
paper for more info on Charon and LLBC.
The next step is to run Aeneas
itself, to
produce a Lean representation of the Rust code out of the LLBC. We pick the
lean backend and generate multiple files, thus separating functions, types,
and proofs.
Aeneas is tied to a particular version of Lean: we adopt the lean-toolchain provided by the Aeneas repository.
Missing functions
The first thing we notice is that the project does not compile (i.e. no lake
build), because some Lean definitions are missing. This is because
the corresponding Rust functions are missing from the LLBC. This happens if e.g.
the functions live in a
third-party crate, or if there is no corresponding --include flag passed
to Charon, which would force extraction to traverse crate boundaries. The
signatures of these functions are emitted by Aeneas as axioms in a Lean (file
FunsExternal_template.lean), but no body is provided: Aeneas simply does not
know the expected behavior of an external function.
The user can assert facts about an external definition; one
easy way to do this is by providing a function body, in other words, by writing
a trusted model of the external function. Here, we
rename FunsExternal_template.lean into FunsExternal.lean, and
fill
out
the function definition. We are now asserting that the external
function read_u64 behaves like its model. The Aeneas-specific scalar_tac
comes in handy to prove that the result is within bounds.
With that, and a little bit of
boilerplate,
the project builds. We create a Proofs.lean file and add it to the main
JxlProofs.lean to make sure it is picked up by the build.
Background on how proofs work in Aeneas
Proofs in Aeneas/Lean rely on two key features.
The first feature is the monadic encoding of Aeneas, which translates a Rust function to a monadic Lean function, i.e. one that relies on the do-notation. See Funs.lean for examples of this. In short, monadic operations (denoted by a left-arrow ←) encode the possibility of failure, e.g. if an index is outside of bounds (line 47), or if a left-shift exceeds the bitwidth (line 49).
The second feature is the step tactic and corresponding step lemmas. To reason
about what it takes for such a function to successfully execute (i.e., reach the
.ok case and never the .panic case), the step tactic goes over each
monadic operation (a shift, an index, etc.) and collects the required proof
obligations along the way (that the bitwidth does not exceed the width, or that
the index be within bounds). After running the step tactic, the proof engineer
is left with a series of subgoals – successfully discharging all of these
subgoals establishes that the function terminates without panics.
Invariants
Before we actually look at a full example of stepping and proving panic-freedom, we start in a bottom-up fashion and present our data structure invariants.
The Proofs.lean file (presented here in its final state) starts with invariants on our data
structures.
Some of those were present in English (as Rust comments), and some others were found
by asking the author of the code for additional properties once I realized that
my hypotheses were not strong enough to conduct my proofs.
For the sake of this blog post, we are only doing partial verification: in a
real project, we would make sure the invariants are properly established at data
structure-creation time, and properly maintained throughout all the code-paths
that lead to the read function we are currently studying.
Helper lemmas
The next section contains a bunch of helper lemmas that I either could not find in Lean’s standard library, or that are simply missing. We decorate them with:
simpattributes, so that they are handled bysimpwithout passing them explicitlyscalar_tac, to make them trigger automatically in calls toscalar_tacwith a suitable patterngrind_patterns, to make them available togrind, also with a suitable pattern – note that we add them to the Aeneas grindset (agrind) so that they participate in automatically discharging subgoals while callingstep*– more on that below.
Again, those were found throughout the course of writing the proofs and this
section grew incrementally – I am simply presenting the final state of the
Proofs.lean file. One interesting tidbit is the case_usize helper, a
particularly annoying lemma that states that a property true both in the 32- and
the 64-bit cases is true for any usize. With the help of Claude (thanks Son!),
we now have a quite concise and effective proof.
The theorems showcase some of the Aeneas tactics: bvify, bv_tac,
scalar_tac, all fine-tuned to work well on the kind of goals produced by
dealing with machine integers.
This is typically the kind of boring, mundane and tedious lemmas that one would do well to prove with AI. Personally, I find those supremely boring.
Proving the absence of panics
Let’s now move on to something slightly more interesting, and write our first specification.
Again, for the sake of example, we focus on the absence of panics. Our
specification is thus going to be quite simple:
we set out to prove Hoare triples of the form f ⦃ r => True ⦄, i.e. function f
terminates with successful result r, no further properties stated (True).
A first fun example is read_u64_spec, presented
here.
It simply states that, provided that there are at least 8 bytes in
the slice passed as an argument (the hypothesis h over the argument bytes),
read_u64 terminates without panics. The proof script itself is quite
simple: we unfold the definition, step over all of the monadic steps of
read_u64 itself, and use the <;> combinator to apply the grind tactic on
all the remaining subgoals. I highly recommended following along in your
favorite editor to see the effects!
One key thing to note: read_u64_spec is itself decorated with @[step]. It
means that callers of read_u64, when they use step themselves, will be
able to leverage this theorem and generate a subgoal (“there are at least eight
bytes in the slice passed to read_u64”) to show termination.
A second fun example is the refill_slow_loop_does_not_panic
theorem,
which proves that the function always terminates, no panics, and no further
hypotheses required.
This proof showcases a recent addition to
Aeneas: reasoning about loops via combinators, rather than recursive functions.
Here, to show loop termination, it suffices to find a decreasing measure, and
because we only focus on the absence of panics, no further loop invariant is
needed.
Proof of termination for other functions is remarkably mundane, and step*
successfully chomps through the monadics steps of the Aeneas encoding. One thing
to note is that step* will automatically try to discharge subgoals
automatically: the subgoals you see after step* are those that could not be
discharged automatically. This is where annotating our earlier lemmas
with the correct attributes (agrind, etc.) pays off. Another interesting tidbit: the treatment
of global const declarations in Rust was recently improved in Aeneas, and now
with the proper
attributes,
one no longer needs dedicated simp [global_simps] calls to inline away
constants: this now happens automatically.
Trick: introducing a helper on the fly
Because Aeneas works on Rust code as written by the programmer, it is sometimes the case that it would be beneficial to introduce a helper, to factor out commonality in the code that we are looking at.
Here,
bucket_index
is a conceptual helper that our target function (read) does not use; however,
we wish to reason about it independently, for modularity of proofs. We write a
step
lemma
for it, this time with an interesting post-condition! Then, we introduce a
rewrite
lemma
that allows substituting a series of monadic operations that correspond to
bucket_index with a call to bucket_index itself.
This is very helpful to make proofs more modular.
Final proof
The final
proof
is relatively concise. Of note: we duplicate the main invariant so that both its
folded form and its unfolded form are available; and we rely on
bucket_index_eq to materialize the helper in our goal. Many of the steps rely
on grind, which is now officially supported by Aeneas.
Most of the subgoals are discharged automatically: the remaining ones rely on
additional facts about bounds that can be proven using bv_tac.
The final two subgoals share a lot of commonality: the same variables are in
scope and the same facts about those are needed in both cases. To share the
reasoning, I used all_goals.
A partial proof
There are many more interesting things that could be proven about this example.
Notably, the read function does not actually preserve the invariant: it only
does so as long as we have not reached the end of the input stream. This would
have to be reflected in its post-condition (omitted here), and we would have to
check that all callers handle the end-of-stream state properly.
Conclusion
I hope this brief example provides a glimpse of how proofs typically work in
Aeneas. With suitable annotations and judicious usage of grind, I believe
those proofs occupy a sweet spot: they enjoy a high degree of automation,
relying on SMT-like tactics (grind), or domain-specific tactics (bv_tac),
while still reaping the benefits of interactivity, namely, being able to see the
goal and debug why a proof is not going through.
