Lessons from writing a compiler

Click for: original source

The standard academic literature is most useful for the extreme frontend (parsing) and the extreme backend (SSA, instruction selection and code generation), but the middle-end is ignored. This is fine if you want to learn how to build, e.g., the next LLVM: a fat backend with a very thin frontend. By Fernando Borretti.

Imagine a compiler as a 2D grid. The rows are the stages of compilation: from parsing, through the middleend, to the backend. The columns are the different facets of the programming language being implemented. For a smaller language like C, the different facets are: global variable declarations, enum declarations, type declarations, function declarations, and nothing else. More featureful languages add extra facets: interface and class definitions in C++, trait and impl definition in Rust, etc.

This article contains some of the lessons I learned writing the compiler for Austral, a new systems programming language with linear types that I’ve been working on for a while. The first few sections are high-level, the rest most specific to using OCaml to write a compiler.

The article then dives into:

  • Implementation strategies
  • Just use OCaml
  • Just use Menhir
  • What tests are useful
  • Compilation model
  • Code organization
  • Bootstrapping issues
  • The environment
  • Identifying declarations
  • Errors
  • Remarks

There are, broadly, two ways to implement a compiler. In the waterfall strategy, each stage is implemented one after the other: the complete parser, then the complete semantic analysis pass, then the complete type checking pass, and on to the backend stages. Excellent read!

[Read More]

Tags miscellaneous programming oop app-development performance data-science