Milawa:  A Self-Verifying Theorem Prover

J Strother Moore

Abstract: In this talk, I present the work of Jared Davis, who in 2009
completed a PhD at the University of Texas at Austin.

Jared has produced a Boyer-Moore style theorem prover (a wad of code that
claims to be a theorem prover without offering any formal proof object) that
has been proved to be sound (or, specifically, faithful to a trusted proof
checker).  In particular, if the ``wad of code'' claims to have proved a
formula then there exists a proof of that formula that can be checked by a
trusted proof checker (resource constraints aside).  That alleged proof does
not have to be generated; Jared proves that it exists.  Furthermore, Jared's
existence proof has been checked by that trusted proof checker.  The result
is that we can have the same confidence in his Boyer-Moore style prover as we
do in the trusted prover.

The trusted proof checker is necessarily simple.  A consequence is that the
proof objects it checks are typically large because each inference step is
tiny.  But a Boyer-Moore style theorem prover is a very complicated piece of
software, its proof of correctness is large even in high-level logical
systems with powerful derived rules of inference.  Jared's solution to this
is to bootstrap his way up: use the trusted prover, level 1, to prove the
correctness of a level 2 proof checker which includes, for example, derived
rules for propositional calculus.  His proof of correctness for level 2
establishes that any level 2 proof can be translated into a checkable level 1
proof.  To prove this he verifies a constructive translator from level 2 to
level 1.  He repeats the process 10 times, eventually getting a level 11
prover and a proof that any level 11 proof can be translated into a checkable
level 1 proof.  Even though he decomposes the proof in this fashion, his
formal proof objects are still very large: one checked proof in the
transition from level 11 to level 10 was 1.9 GB (although he eventually found
a way to reduce the size of this proof).

Perhaps most interestingly, at each stage he actually uses the untrusted
theorem prover to prove its own correctness, then he then uses the translator
to produce a proof at the next lowest (trusted) level, and actually checks
that proof.  These proofs are the only ones that ever have to be generated or
checked.  When all is said and done, the level 11 system -- which was used to
generate a proof of its own correctness -- need never produce a proof object
again.  Therefore, Jared has not only produced a Boyer-Moore style prover but
demonstrated its efficacy by using it at a decidedly non-academic scale: to
prove its own correctness.

After the presentation we can discuss the trade-offs between verifying the
theorem prover versus (effectively) checking every proof step at runtime.  Of
course, this dissertation points the way to the smooth combination of the
two, in which self-witnessing user-supplied tactics are used to extend a
substantive, verified Boyer-Moore prover while preserving the certainty of
fidelity.