Perhaps the greatest surprise of the last two years was, for me, the realization that people not only care about compiling C to Rust (for obvious reasons, such as, ahem, memory safety) – they also care about compiling Rust to C! Wait, what?

I wrote about this briefly a couple years ago, but the level of interest for the project, I must say, took me somewhat by surprise. So let’s talk about compiling Rust to C a little more today.

Barriers to Rust adoption

Rust is making big progress in terms of adoption, and represents a great value proposition, especially for new code. Both my former employer and my new employer, like pretty much everyone else these days, have big projects that are written in pure Rust or can have Rust components. Even Windows kernel drivers can be written in Rust now. Amazing stuff.

However, if your project is, say, an open-source library that gets compiled on a wonderfully diverse set of target architectures, OSes, distributions and toolchains, well, chances are… one of these is not going to support Rust. Think of a crypto library: there will be people out there with an obscure compiler for a weird embedded target, and they really want to compile your library, because they’ve been told not to roll out their own crypto. Or perhaps you have a format library ridden with memory errors and you want to port it to Rust. Or maybe your company has an in-house analysis that only runs on C code. Regardless of the scenario, there will always be that one legacy use-case that prevents you from switching to Rust until it’s 2035, all those LTS versions (looking at you RHEL) are finally retired, and you yourself are too close to retirement to even care anymore.

That is, unless you’re willing to use a Rust to C compiler.

Why?

Having a backwards-compat scenario where Rust can be compiled to C serves several purposes.

  1. It allows for a gradual transition. The codebase can be ported to Rust, and refactored / cleaned up / rewritten to use all the nice Rust things (data types, pattern-matching, polymorphism, memory safety), thus making you and your developers much, much happier. Meanwhile, the C version co-exists so that you don’t alienate your userbase.
  2. It only requires maintaining a single version. The Rust code is authoritative; the C code is derived from it automatically, either on CI, or at least with a CI job that checks that the two are in sync.
  3. It allows for a census of problematic scenarios. By making the Rust version the default (and putting the fallback C behind a --write-us-an-email flag), there is finally a way to enumerate those mythical users who cannot switch to Rust just yet.

If that sounds appealing, meet Eurydice.

Eurydice

Eurydice is a compiler from Rust to C that aims to produce readable C code. Of course, readability is subjective; also, seeing that Rust relies on whole-program monomorphization, the C code is bound to be more verbose than the Rust code. But you can judge for yourself: here’s the result of compiling libcrux to C.

The output of the test suite is under version control, and there are a lot more tests to peruse. See for instance this bit, compared to the Rust original.

The design of Eurydice

Eurydice plugs in directly at the MIR level, using Charon to avoid reimplementing the wheel and paying the price of interacting with the guts of rustc. Our paper on Charon says more about its architecture.

The advantage of plugging in at the MIR level is that i) we do not have to interpret syntactic sugar, which means our translation is more faithful to the Rust semantics, and ii) we have way fewer constructs that need compiling to C. Even then, it’s no easy feat to translate Rust to C.

There is naturally, the need to perform whole-program monomorphization, over types and const-generic arguments; the compilation of pattern matches into tagged unions; recognizing instances of iterators that can be compiled to native C for-loops. Then, there are more subtle things, such as compiling array repeat expressions sensibly – zero-initializers when possible, initializer lists otherwise, unless it generates too much code, in which case for-loops are preferable. And finally, there are all the rules about visibility, static, inline, etc. that are very C-specific and depend on how you want to lay out your C files.

The translation is complicated by the constraint that the generated code ought to be readable: for instance, we compile Rust structs to C structs, including DSTs, by relying on flexible array members. We also work hard to avoid using the fully-generic tagged union pattern when possible, instead eliminating the tag when e.g. the Rust enum only has a single case. Additionally, we rely on Charon to reconstruct control-flow, rather than compile the MIR CFG to C code ridden with gotos; again, this is for code quality.

At a low-level, there were many interesting tidbits.

  • Because arrays in Rust are values, we wrap them within C structs to give them value semantics in C, too; concretely, [u32; 8] becomes struct { uint32_t data[8]; }. (A previous version of Eurydice would emit uint32_t *, and rely on various memcpys to implement value semantics, but this produced a translation that was not type-generic, and there were plenty of finicky corner cases. We revamped the compilation scheme recently.)
  • The notion of lvalue in C means we need to insert more variable declarations than in Rust – for instance, you can’t trivially compile &[0u32; 1] without naming the array.
  • The fact that the evaluation order is so loosely defined in C means that intermediary computations need to be stored in intermediary variables to enforce the evaluation order.
  • Rust relies on whole-program monomorphization; this means that the C code is inevitably going to contains multiple copies of the same types and functions, but for different choices of type and const generic argumnets. This is currently done with a builtin phase in Eurydice (for historical reasons), but in the long run, we want to rely on Charon’s support for monomorphization.
  • There are plenty of peephole optimizations that are required for good code quality, such as recognizing array::from_fn and generating sensible code that initializes the array in-place (instead of relying on the fully-general compilation scheme for closures), or recognizing instances of the Eq trait that deserve dedicated treatment (such as using memcmp for arrays and slices of flat data).

A final design choice is that for now, Eurydice may define more behaviors than Rust – for instance, Rust panics on integer overflow, but Eurydice-compiled code does not. This is because we assume the input code is verified, and therefore has been shown to be free of panics. This design choice can be easily changed, though.

In practice, as soon as you use traits, the C code becomes more voluminous than the Rust code. We rely on a configuration file mechanism to control the placement of monomorphized instances of a given function, rather than put everything in one big C file. This currently requires a lot of manual intervention to give good results on large projects.

Implementing of Eurydice

Eurydice starts by compiling the MIR AST obtained out of Charon into KaRaMeL’s internal AST. This is ~3000 lines of OCaml code, so that’s already pretty involved. A lot of the work revolves around trait methods and their monomorphization, given Rust’s expressive trait system.

Then, about 30 nanopasses simplify the KaRaMeL AST until it becomes eligible for compilation to C. Of those, a handful were originally written for KaRaMeL and were somewhat reusable; this includes compilation of data types, as well as monomorphization. The rest was written from scratch for Eurydice, and totals about ~5000 lines of OCaml code.

A particularly gnarly phase was eliminating MIR’s variable assignments as much as possible: in MIR, every variable starts out uninitialized at the beginning of the function; then, in lieu of the variable declaration, we have an assignment with the initial value. Naturally, having a variable declaration in the right spot is better for code quality, so an initial phase tries to reconstruct these assignments. That’s a drawback of using MIR, but we still firmly believe that sticking to something that has clear semantics is ultimately better.

Fun fact: because there are so many peephole optimizations, I got tired of maintaining enormous pattern-matches that would try to catch every flavor of Rust iterator that can be compiled to a C for-loop. Instead, a custom OCaml syntax extension allows writing concrete syntax for the internal KaRaMeL language in OCaml patterns. Those magic patterns then get compiled at compile-time to OCaml AST nodes for an actual OCaml pattern that matches the (deeply-embedded) syntax of KaRaMeL’s AST. This relies on a ppx that lexes, parses and compiles the concrete syntax.

Deploying Eurydice-generated code

Eurydice-generated code expects some hand-written glue that contains macros and static inline functions; sometimes, it’s simply more convenient to write a single macro that uses a type, rather than have Eurydice generate N copies of a polymorphic function that gets specialized each time. A typical example is compiling the Eq trait for arrays: it’s nicer to emit Eurydice_array_eq(a1, a2, len, t), which macro-expands to !(memcmp(a1, a2, len*sizeof(t))), rather than have N such functions, each containing a for-loop specialized for different values of t.

Eurydice generates code that is either (C11 and C++20-compatible) or (C++-17 compatible, but not C-compatible). The reason for this is that Rust allows enum values (e.g. Foo { bar: baz }) in any expression position. For simplicity, Eurydice emits a compound initializer (Foo) { .tag = bar, .value = { .case_Foo = { .bar = baz }}}, or a C++20 aggregate that uses designated initializers, relying on a macro (not shown here) to hide the syntax differences between the two. But C++17 does not have designated initializers, so there is an option for Eurydice to emit different code that relies on member pointers to achieve sensibly the same effect.

Limitations of Eurydice

Naturally, there are many limitations to this approach. Here are the main ones that come to mind:

  • we cannot guarantee that the layout of objects will be the same in C as in Rust; conceivably, one could parse the layout information from MIR, then emit compiler-specific alignment directives to keep the two identical, but this is not done currently;
  • the generated code violates strict aliasing, because creating a user-defined DST involves casting one pointer type (a struct containing an array) to another (a struct with a flexible array member instead); I’m not sure what the best fix is, so for now, please compile your code with -fno-strict-aliasing;
  • the code that Eurydice sees is MIR after applying cfg tweaks; this means that for code that is intended to be multi-platform, some tricks need to be applied, otherwise, Eurydice will only “see” one version of the code (AVX2, or ARM64, or something else)
  • because monorphization is so pervasive, the configuration language needs to express things such as “types that reference __m256i, an AVX2-only type, need to go into a separate file to be compiled with -mavx2”; this can get tedious real fast but I’m not sure I know how to do better.

What’s next?

There is ongoing work to integrate Eurydice-generated code for both Microsoft and Google’s respective crypto libraries.

The community grew recently, with wonderful contributions by GitHub users @ssyram and @lin23299. There are more in the pipeline, and I look forward to seeing the supported subset of Rust grow even more. Next on the horizon is support for dyn traits via vtables, and relying on Charon’s monomorphization to get MIR exactly as the Rust compiler would monomorphize it, intead of relying on a custom procedure in Eurydice.

An ambitious goal is for the whole standard library of Rust to be extractable via Eurydice in 2026. This is non-trivial, but I believe this achievement is within reach. Stay tuned.

PS: Why the name?

People keep asking about the name; because the project shares a large amount of infrastructure with Aeneas and Charon, I had to follow the Greek mythology theme. Specifically, the myth of Eurydice resonated with me: I thought I was saved from the hell of generating C code, and was going to go back to the world of the living, but alas, no.