Homework 1: Coq & CompCert
Due date: February 8, 11pm
Grading: 5% of your course grade; 2% for Step 1 and 3% for Step 3
Coq is an interactive theorem prover widely used for software verification. We'll see applications of Coq in this course to compilers, file systems, and distributed systems, but there are many others, including formalizing mathematical proofs. There's a great community-maintained list of projects using Coq.
In this homework, we're going to work with the CompCert verified C compiler, developed in Coq. We read the original 2009 paper about CompCert in class, but it's still going strong, and is used in production settings where strong correctness guarantees are essential, such as aviation.
The goal of this homework is to add a simple peephole optimization to the RISC-V version of CompCert and prove it correct. RISC-V is an open-source instruction set architecture (like x86 or Arm), originally from Berkeley, but now being supported by a variety of companies and used in many (mostly embedded) applications. A peephole optimization is a small compiler optimization that can be made without detailed knowledge of the context for the code. We'll be reading more about automatically verifying peephole optimizations in the Alive paper, but for this homework we'll be doing it by hand in Coq.
Table of contents
Preparation
First, we'll need to install some prerequisites for CompCert: OCaml, its package manager opam
, and Coq itself (version 8.14).
We'll also need to install a RISC-V compiler, since we're going to be adding our optimization to CompCert's RISC-V backend—while
CompCert provides the compiler, the existing toolchain provides the linker, assembler, etc.
(as an exercise, think about how this choice affects CompCert's safety guarantees).
I've tested these instructions, but everyone's system is a little different. If you get stuck anywhere before "Step 1" of the homework below, please post on the Canvas discussion board, as it's likely others have had the same problem!
Mac
On a Mac with Homebrew installed, let's first get our hands on ocaml and opam:
brew install ocaml opam
We'll also need to use opam to get a necessary dependency for CompCert:
opam init -a
eval `opam env`
opam install -y menhir
Next, let's install Coq. The latest release of CompCert requires Coq version 8.9–8.14, but the Coq version available in Homebrew is newer than that, so we need to be a little crafty to tell Homebrew we really do want an old version:
curl -o /tmp/coq.rb https://raw.githubusercontent.com/Homebrew/homebrew-core/a1765e91c8934bc8f7f29063945681ad6bd32ab0/Formula/coq.rb
env HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK=1 brew install --formula /tmp/coq.rb
Finally, we can install the RISC-V toolchain, also using Homebrew:
brew tap riscv-software-src/riscv
brew install riscv-gnu-toolchain --with-NOmultilib
Note that for Apple Silicon Macs, RISC-V doesn't yet provide precompiled toolchain binaries, so Homebrew will compile them from source. This is a good chance to grab a coffee (it took about 15 minutes for me).
Once all that is done, skip along to Choosing an IDE.
Linux
On Ubuntu 20.04 (and possibly other versions), this should be enough to get everything you need:
apt install build-essential opam ocaml coq gcc-riscv64-linux-gnu
opam init -a
eval `opam env`
opam install -y menhir
If all that worked, skip along to Choosing an IDE. For other distributions, or if that doesn't work, read on to install from source. We'll use opam to install OCaml by running:
sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)
opam init -a
eval `opam env`
Then install OCaml and some Coq dependencies by using opam:
opam switch create 4.12.0
eval `opam env`
opam install -y ocamlfind zarith menhir
Now we can install Coq from source (consider passing -j8
to make
to use 8 cores):
wget https://github.com/coq/coq/archive/V8.14.0.tar.gz
tar xzf V8.14.0.tar.gz
cd coq-8.14.0
./configure
make
make install
Finally, we can install the RISC-V cross-compiler. The easiest way is to get it from your package manager (it's gcc-riscv64-linux-gnu
on Ubuntu, Debian, and Fedora; search for riscv
on other distributions). If you can't find it there, you can download a binary toolchain (select riscv64-lp64d
as the architecture and glibc
as the libc).
Windows
You're on your own, sorry! The Windows Subsystem for Linux might work. Otherwise, I strongly recommend trying to get your hands on a Mac or Linux machine or a Linux VM for this homework—if this is an issue, let me know and we'll try to find you a suitable machine.
Choosing an IDE
Coq proofs are just text files like any other programming language, so you're free to use whatever text editor you prefer. But proofs are a lot easier to write with IDE support for showing you the proof context, exploring available lemmas, etc. I have three suggestions here:
- For Visual Studio Code, the VSCoq extension is good.
- For Emacs users, people like Proof General, optionally with the Company-Coq extensions, though I have no experience with them.
- Coq has its own CoqIDE tool, which is not the prettiest option, but gets the job done.
In all cases, you should set up CompCert (see below) before opening it in your IDE, so that the proofs are pre-compiled and fast to step through as you work on them.
Set up CompCert
We'll be using GitHub Classroom to check out and submit this homework.
Follow the GitHub Classroom URL on Canvas to create your private copy of the CompCert repository, and then clone that repository to your machine. For example, the repository it created for me is called hw1-coq-jamesbornholt
, so I would do:
git clone git@github.com:cs395t-svs/hw1-coq-jamesbornholt.git
cd hw1-coq-jamesbornholt
Now let's compile CompCert. The toolprefix
argument will differ depending on which riscv64 toolchain you got: the Mac version from Homebrew uses riscv64-unknown-elf-
, while the Linux version from the Ubuntu package manager uses riscv64-linux-gnu-
. (consider passing -j8
to make
to use 8 cores).
./configure -toolprefix riscv64-unknown-elf- rv64-linux
make all
If all goes well, you now have a ccomp
binary in your CompCert directory—that's the compiler!
Test the compiler
The task for this homework is to add an optimization to CompCert that rewrites x + x
to x << 1
.
Before we do that, let's see what the compiler currently emits for these expressions.
Save this C program as test.c
:
int foo(int x) {
return x + x;
}
int bar(int x) {
return x << 1;
}
Now compile it to assembly with:
./ccomp -S test.c
In the output file test.s
, you'll notice the difference between the two functions is that foo
uses addw x10, x10, x10
while bar
uses slliw x10, x10, 1
(RISC-V's shift-left immediate instruction for 32-bit operands). x10
is both the first argument register and the first return value register in the RISC-V ABI.
Step 1: Add the optimization
We're going to add a peephole optimization that rewrites x + x
to x << 1
.
The first decision we have to make is: where should we add this transformation?
Recall that CompCert is organized into a series of compiler passes,
starting from (essentially) C and transforming all the way down to assembly.
We'll add our optimization to the CombineOp
function,
which is part of the common subexpression elimination (CSE) RTL-to-RTL pass
that uses value numbering to eliminate redundant expressions.
This is a nice place to make our change because one of the things we'll need to do in our optimization
is identify instances of x + y
where both operands are the same—value numbering is an easy way to implement that check.
Open riscV/CombineOp.v
from the CompCert directory.
This file has a combine_op
function that implements the pass.
It takes as input an operation and a list of the operation's arguments as valnum
s,
which are abstract identifiers for values—if two valnums are equal, the values they reference to are know to be the same,
otherwise they might be different.
Modify this function to rewrites x + x
into x << 1
.
It will be easiest to add this as the last match arm before the catch-all _, _ => None
.
You'll probably need to look at the definition of the operation
type to figure out which operations to match on and to emit.
You'll know you're done with your change when make
no longer fails on COQC riscV/CombineOp.v
(it will still fail, just later in the build).
Then you can proceed to Step 2 below.
A couple of other hints:
- You can refer to the value
1
of typeint
asInt.one
(orInt.repr 1
). - You can't check if two values of type
valnum
are equal using=
, because Coq does not assume equality is decidable in general. Luckily, CompCert already proves that equality ofvalnum
s is decidable. You'll need to write something likeeq_valnum a b
instead ofa = b
.
Step 2: Skip the proof and test the optimization
Once you've extended combine_op
with the new optimization, we can test it.
If you've successfully added your optimization, make
will fail with a message like:
File "./riscV/CombineOpproof.v", line 171, characters 0-4:
Error: (in proof combine_op_sound): Attempt to save an incomplete proof
CompCert is a verified compiler, and we haven't yet proven our optimization correct!
The proof of combine_op
is combine_op_sound
, which lives in riscV/CombineOpproof.v
.
Notice that there's one case in the proof for each arm of compile_op
.
However, one nice thing about CompCert (and many other verified tools) is that the proof is optional; the code will still run, but it won't have the safety guarantee we want until we fill in the proof. This is great for testing the code before investing in an expensive proof of it, which might turn out to be futile if the code is wrong!
Let's take advantage of this flexibility by skipping the proof for a moment so we can test our compiler change.
Assuming you added a new arm to compile_op
as the last arm before the catch-all,
you can extend the proof by adding a new final case right before Qed
.
Eventually we'll fill in a proof here,
but for now just write:
- admit.
and replace the Qed
with Admitted
to tell Coq that you realize the proof isn't finished yet.
Now run make
again and it should succeed.
Run ccomp
over the test.c
file above again.
If all went well, the foo
function in the test.s
output should now be using slliw
, and you can proceed to Step 3.
If not, don't go any further, as your proof likely won't work! Try to figure out what you got wrong in combine_op
.
Step 3: Finish the proof
Let's go back and fill in the proof so that we can restore CompCert's safety guarantee. It's good to be modular when writing proofs, so we'll do this in two parts: first some lemmas and then the final proof.
Fill in some lemmas
You're going to want to prove a few lemmas that show that x + x == x << 1
at various layers of reasoning (CompCert's val
and int
, and Coq's Z
type—you might find these links handy to find lemmas you can use about each type).
Insert these lemmas in riscV/CombineOpproof.v
before combine_op_sound
, and then fill in their proofs
(at which point you can replace Admitted
with Qed
in each lemma):
Lemma Z_shl1_add:
forall (x: Z), Z.shiftl x 1 = x + x.
Proof.
(* my proof is ~4 tactics *)
Admitted.
Lemma Int_shl1_add:
forall (x: int), Int.shl x Int.one = Int.add x x.
Proof.
(* my proof is ~4 tactics *)
Admitted.
Lemma Val_shl1_add:
forall (x: val), Val.shl x Vone = Val.add x x.
Proof.
(* my proof is ~8 tactics *)
Admitted.
A couple of hints for this step:
- Notice how the three types are defined in the links above:
int
is defined in terms ofZ
, andval
is defined in terms ofint
(among others). That should tell you how these lemmas will interact: you'll likely want to use theZ_shl1_add
lemma aboutZ
in the proof ofInt_shl1_add
aboutInt
, and in turn useInt_shl1_add
to proveVal_shl1_add
. - You'll need a variety of tactics in these proofs, although none of them should require induction. There are some handy tactic references on the internet: this one by Joe Redmon is (in)famous, or there's this cheatsheet from Cornell.
- One tactic not covered in these references is
f_equal
, which lets you transform a goal of the formF a = F b
intoa = b
, since if you can prove a = b then you also know that f(a) = f(b).
- One tactic not covered in these references is
- The
Val_shl1_add
proof is a little trickier than the others because<<
is undefined if the shift amount is greater than or equal to the width of the result type. Of course, we're only shifting by 1, and working with 32-bitInt
s, but CompCert's implementation of<<
is parametric in the word size. You'll need to do a little work in your proof to convince Coq that the undefined case can't happen in our setting. There are a few ways you could do this. One would be to prove a separate lemma that the wordsize is greater than 1. Another would be to use thereplace
orassert
tactics to assert that it's true (and be forced to prove that as a separate goal).
Extend the top-level proof
Finally, we can complete the proof of combine_op_sound
by using Vshl_1_add
plus a couple more tactics (a total of 3 tactics in my proof).
You'll know you're done when this file no longer contains admit
or Admitted
, and make
succeeds.
What to submit
Submit your solutions by committing your changes in Git and pushing them to the private repository GitHub Classroom created for you in the Set up CompCert step. If you haven't used Git before, Chapters 1 and 2 of the Git book are a good tutorial.
The only files you should need to modify are riscV/CombineOp.v
and riscV/CombineOpproof.v
. GitHub will automatically select your most recent pushed commit before the deadline as your submission; there's no need to manually submit anything else via Canvas or GitHub.
Credits
This exercise is based on one from UW's CSE 599W — thanks!