A short history of Milawa's development, merely for posterity.
I don't know exactly when I started working on a project like Milawa, but the oldest file I found has a modification date of December 2, 2004, and that seems about right. During this time, I was still taking classes at UT. I probably first seriously talked to J about a verified theorem prover sometime in October or November of 2004.
I was quite confused back then. Looking back on the files I still have, I do find some recognizers for terms (of literals, variables, and function calls), but it looks like I didn't understand the difference between terms and formulas, and for some reason I called all of the formula operations "meta" operations.
I had written some substitution routines, and developed simple lists of the base axioms I wanted to have. I was apparently quite concerned about how to extended functions, probably due to wanting to extend the proof checker itself with new rules. I had tried to write the evaluator with the arguments (x defs) instead of (x defs n), and to address termination I have a long discussion about finding or providing a measure, etc.
For proofs, I had first intended to use a sequential structure with named steps, but in an early conversation J suggested trees and I remember being quite encouraged at how much better this seemed.
By mid February, the project had apparently evolved to include the notion of appeals, step checking functions, and a proof checking routine that called on these step checkers. In the last files from this time, dated March 4, 2005, I had arrived at a term-only logic with instantiation and axioms, but did not have any other rules, though comments mention the need for some kind of equality substitution rule.
I had lost interest in the proof checker around this time. I had some class projects to work on, and spent most of my ACL2 time working reading through the C99 standard and developing an ACL2 model of C's arithmetic expressions, which I would later abandon when that summer I headed to Iowa for an internship with Rockwell Collins.
The fall semester of 2005 would be the start of my 3rd year at UT, so during my time at Rockwell I was looking for a dissertation project. Near the end of my internship, I had some discussions with Dave Greve about the limitations of meta rules again got me interested in the verified prover, and I think I must have reread John Harrison's technical report.
Then, probably around the beginning of August, I drove alone for five hours to Minneapolis to visit some friends. The car didn't have a CD player, so rather than listen to rural Iowa radio stations I started discussing the details of the project with the highway. I remember becoming pretty excited after figuring out how I could use computational reflection, and I also remember coming up with the appeal-provisionally-okp function to separate out the step checkers from proofp. I'm pretty sure I convinced myself that this would be my project during that car ride.
My last weeks at Rockwell were sufficiently busy that I had no desire to work on a new project at home, so I didn't get started until I was back in Austin for the Fall semester.
I started working on Milawa when I got back to UT. I was only taking one class, Introduction to Logic, which gave me some pretty significant time to work on the project.
I started fresh. I wanted to use something close to the ACL2 logic, so I assembled Shoenfield's book, Shankar's dissertation, the ACL2 book, the structured theory paper, and J's "quick and dirty logic" paper, from which I was able to develop my term and formula recognizers and choose my rules of inference. I implemented the appeal system to represent proofs.
Some of the hardest work was to just come up with the basic derivations I needed. I think everyone who worked in the tower came by at some point or another and tried to help with some derivation or another. The worst was disjoined associativity. Sadly enough, that rule alone probably took me more than a week to figure out.
I also remember spending some time on ACL2 proofs. I know that in one of my meetings with J, I showed him how I could prove in ACL2 that a stack of three or four provers was sound. This was just using commute or, right associativity, etc. as the extensions. That's the first time I remember him being really excited about the project.
On November 16, I gave my first presentation about Milawa at the weekly ACL2 seminar. I walked through the logic and the proof checkers, and the ACL2 proof that proofp-2 (commute or) is sound. At that point I'd developed a litany of basic derivations and even developed a simplistic unconditional rewriter (no evaluation; equality only) that could simplify if expressions if given the right rules.
This was a pretty useful talk, and I walked away from it wanting more automation and worrying about evaluation. By Thanksgiving, I think I had recreated Shankar's functions for tautology checking, iff substitution, and equality substitution, and I built a tower with these extensions. And I think I can remember talking about my deduction rule extension at some party in December.
I spent quite a lot of time figuring out how to handle evaluation. I had begun to realize that evaluation needed to have a special role in the prover: to use a procedure reflectively, we need to be able to run it on a concrete proof object. But evaluation raised a lot of issues about partial functions, termination, and paradoxes involving (eval (eval ...)). I remember spending a lot of time thinking about rankings or other ways to characterize when it was safe to call eval, and I certainly wasted a lot of Matt's time with it.
Somehow out of this I came up with the idea of a base evaluation rule that would only apply to the primitives. I remember explaining some of the details to Warren and Matt on the chalkboard, and being pretty excited at how simple everything had become. I think I had implemented the evaluation extension by some time in March.
With evaluation in hand, I implemented a better unconditional rewriter, and went to J and Matt for advice about conditional rewriting. I probably spent most of April and May working on a simplifier and trying to start the bootstrapping process, but I didn't get very far on this. The simplifier had become very complicated and I had gotten too far from the builders. I couldn't really see how to do the proofs alongside its operation.
I gave a few talks about Milawa during this time. I made a poster for GradFest (where the prospective grad students come to see the university) and gave a talk to Warren's CyberTrust group in March. Then, in April, I gave a short talk to Victor Marek who was visiting.
For my verification class's project, Warren also challenged me to use my system to prove something "real", so I did some simple equivalence proofs for a toy circuit and presented the work to his hardware verification class. We also looked into speeding up the prover with his hash consing code during April, and we found some really promising results.
I was getting eager to propose, and under Warren's guidance I spent most of the summer working on the proposal. I spent a lot of time reading papers and looking at other systems, and trying to present my work clearly. I didn't spent much time on the prover itself until probably September.
I did give a talk on Milawa at the beginning of August at the ACL2 seminar, but it only covered the evaluation rule that I'd already developed that Spring.
As the fall semester started, the proposal was reviewed and informally considered acceptable, but it could be strengthened if I could just complete the first stage of bootstrapping.
My previous attempt at bootstrapping had not gone well, and after reading about tactic-based systems, I decided I had tried to be too ACL2-like, doing everything with a built-in simplification strategy and giving it hints when it got stuck. Maybe instead of trying to build a sophisticated simplifier, I could just build up simpler tactics and then do the bootstrapping more manually.
To get a feel for tactics, and also to branch out a bit, I spent some time working with Isabelle/HOL. I tried modelling my proof checker, figuring that if it worked well I could perhaps prove it sound in a higher order system. I even gave a little talk on Isabelle to the ACL2 seminar in October, after which I decided this had been a bad digression and had taken me off task.
Still, I think this was a useful in giving me a feel for tactics. I figured out that I could use clauses instead of sequents as the basis for tactics. I developed an initial system of proof "skeletons" that could later be filled in by a tactic compiler, and wrote some simple tactics for removing trivial and duplicate literals from clauses.
This seemed like it would work well, so I decided that my first interesting tactic would be a clausifier and if-lifting program. It wasn't easy to see that the algorithm terminated, and unfortunately the proof was interesting enough that I kept at it. I probably spent two or three weeks on this alone. Eventually a working and verified clausifier was finished.
After the clausifier was done, I returned to rewriting. I was ready to throw out the old conditional rewriter and take a more measured approach in developing it. I started by rewriting the unconditional rewriter to extend it with iff reasoning and to make it use the same rule structures the conditional rewriter uses.
It became difficult to manage the supporting derivations because of how long they were. I spent quite awhile developing a "defderiv" macro system that would make it much easier to introduce derivations, and also improved this macro to automatically write LaTeX versions of derivations for direct inclusion in the proposal or other documentation. I finished the unconditional rewriter and its builders proofs, though these were a little difficult due to massive case splitting.
February and March were really a couple of productive months. In this time, my difficulty with the unconditional rewriter led me to the tracing system, and I implemented and carried out its proofs. Also during this time, I implemented the beginnings of my user interface, along with several tactics for splitting, generalizing, unconditional rewriting, and induction. There were a lot of things that went into this, but by March 27 I had carried out a Milawa proof of natp-of-len, my first inductive proof. The tools I had were primitive, but the basic interface and bootstrapping approach was largely similar to the final setup.
My attention then turned to developing the conditional rewriter and its assumption system. By the end of April, I had developed the assumptions system and had a working rewriter, and had translated all of the proofs in primitives.lisp and part way through utilities.lisp.
Bootstrapping is largely a firefighting process. You try to translate some proofs. When you have trouble, you add tools to help you figure out what went wrong (like tracing and profiling), then try to figure out what you need to add or do differently. This process led me to implement iff-based assumptions and rewriting, ancestors checking, new tactics for car-cdr elimination, improve the sizes of various builders for splitting, etc.
By May 25, I'd finished bootstrapping through the utilities directory. With my tools improved from that effort, it took only about another week to get through the logic directory and verify a demo extension (with just the commutativity of or).
This was a big milestone, and after it was done I spent a good while working on getting the "core proof checker" to run on a bunch of Lisps and actually check the proof.
By the end of August, my tools were getting a lot better. I had extended the rewriter with free-variable matching, forcing, and caching. I added %autoprove to the bootstrapping interface to avoid copying and pasting goals, and developed the provisional certification mechanism to speed up re-certification (which was really important, since my tools were still changing a lot and this was causing me to work on updating proofs.)
I spent a lot of time in August and September working on a journal paper that described my rewriter, which ended up being rejected. This was disappointing, but it was useful when it came time to write up the rewriter for my dissertation.
In October I gave my proposal, which took a bit of time. By the time of the proposal, I had finished the bootstrapping through Level 4, which gave me the ability to say that the approach indeed seemed practical, and to highlight the reduction in proof size after introducing the intermediate levels.
The following months were good times. There was a lot of work to manage and avoid large proofs, and it was a chore to manage proofs. There were also some distractions; I took a few weeks to learn Ruby and write the addjob program to be able to use more computers. Also, at the end of November my wife and I bought our home, and we spent the better part of two months working on that. At any rate, by the end of April, I had finished bootstrapping through Level 8, and was feeling that the project was getting close to done.
In the summer I worked full time at Centaur, and spent almost no time working on Milawa. The only thing I really did during this time was writing the fast rewriters.
My internship at Centaur continued through the fall, but I reduced my hours to 30 per week so that I could begin work on the dissertation. I spent some minor time with the code, but was mostly writing. Our daughter, Calista, was born in November, so I had a fair amount going on.
Throughout the Spring, I did a great deal of writing. I did not really return to the new coding until May. To make the dissertation better, I wanted to finish the fast rewriter proofs, and this was quite hard and took a lot of time.
In August, very near my deadline, I finished those hard proofs. Then, in about a weekend, I did all of the bootstrapping for Level 11, and the project reached its final state.
Most of my effort was spent writing the dissertation. I also worked on improving the code somewhat during this time, and really finished the boostrapping. J and Matt gave me tremendously useful comments on drafts, and I managed to answer most of their questions.
I defended in September and somehow passed. It was good to meet John Harrison, who gave me many useful comments and corrections. We celebrated with a nice dinner. After a few weeks break, I finished the corrections and received the blessings of the ruler lady, and that was that for grad school.
Late one night I discovered Magnus's dissertation, and was shocked at (1) how cool it was, and (2) the similarity between his toy Lisp and Milawa's subset of Lisp. I contacted Magnus and we began a (very one-sided) collaboration, which ultimately resulted in the development of a verified Lisp that can run Milawa. Magnus eventually carried this through the very, very impressive HOL result showing that Milawa's kernel is sound.