The PLTP Archive

J Strother Moore

reporting joint work with
Robert S. Boyer
and with contributions by
Grant O. Passmore
March, 2018

Abstract

The first general-purpose automatic theorem prover explicitly designed for the mathematics behind program verification was the Edinburgh Pure Lisp Theorem Prover otherwise known as PLTP. PLTP was created in 1972 and 1973 by Boyer and me while we were in the Department of Computational Logic (formerly, the Metamathematics Unit), School of Artificial Intelligence, University of Edinburgh, Scotland. It was a direct response to the inadequacies of the then-popular resolution-based first-order provers (upon which we had been working) for program verification purposes. The Annotated Bibliography below provides scanned images of the 1973 POP-2 code for PLTP, its input, and its output, as well as links to other relevant material including POP-2 reference manuals, my dissertation, and articles about PLTP published in 1973 and 1975. In addition, we provide two modern (2017) recreations of PLTP, one in ACL2 and the other in OCaml. But first I would like to describe the origins of PLTP and of this archive.

Edinburgh, 1972 — 1973

Boyer and I met in Edinburgh in September, 1971. Initially we worked together on resolution provers, specifically Bob Kowalski's SL-resolution and structure sharing in both linear and general resolution. This led us to explore the creation of a precursor to Prolog called Baroque which allowed one to use SL-resolution to interpret programs and to prove theorems about them (see page 70 of Moore's dissertation). We then moved from Baroque to a Lisp subset implemented in Baroque (see page 75 op. cit.). See Part I of my dissertation below for a report on the first year of our collaboration. (See the Annotated Bibliography below for proper citations.)

For example, we could prove that there exists an X such that the recursively defined (LENGTH X) is equal to 3. But we could not prove that (LENGTH (APPEND X Y)) is (+ (LENGTH X) (LENGTH Y)). The latter conjecture requires mathematical induction.

Our focus on proving properties of programs as a test bed for our efforts in resolution led us by 1972 to the realization that we needed to build a prover capable of induction.

The standard approach to program verification in the early 1970s was to (a) annotate a program with assertions, (b) use one of several methods to generate ``verification condition'' formulas, and then (c) use an ad hoc simplifier or resolution theorem prover to attempt to prove those formulas. It was not uncommon for researchers of the day to add additional ``lemmas'' or specially tailored ``axioms'' to the set of input assumptions in order for the prover to succeed.

The reason for this, as we pointed out in the Preface of our 1979 book A Computational Logic (Academic Press):

One reason that researchers have had to assume ``lemmas'' so freely is that they have not implemented the principle of mathematical induction in their theorem-proving systems. Since mathematical induction is a fundamental rule of inference for the objects about which computer programmers think (e.g., integers, sequences, trees), it is surprising that anyone would implement a theorem-prover for program verification that could not make inductive arguments.

We speculated on the possibility that researchers (unconsciously) assumed that the only roles for induction was to unwind loops and handle procedure calls — roles fulfilled by the program semantics. But regardless of why, it is a fact induction was not supported even though many verification conditions involved properties of inductively constructed objects.

Thus 1972 found us proving theorems about recursive list processing functions, by hand on the blackboard, and discussing the techniques and heuristics we were using. The main questions were

When did we decide to use induction?
How did we decide what to induct upon?
How did we formulate the induction hypotheses?
How did we simplify the induction conclusions so we could appeal to the induction hypotheses?

These exercises soon led us to implement our methods in the Edinburgh Pure Lisp Theorem Prover, PLTP. Briefly put, PLTP was a fully automatic theorem prover for properties of recursive functions over list structures. Since our focus was on induction, we limited the language to a homegrown version of Pure Lisp supporting just the object NIL and ordered pairs constructed by CONS. The language included CAR and CDR, a 3-place IF (actually named ``COND'' in early work), and EQUAL. Natural numbers were just abbreviations for lists of NILs giving rise to Peano-like definitions of arithmetic. The user could define recursive functions and challenge the prover to prove properties. It was not possible for the user to offer hints, lemmas, or other help.

In this setting we could explore automatic proofs of the most elementary results of elementary number theory as well as properties sequences and trees and of functions that mixed those theories, e.g., we could pursue the kinds of proofs required by program verification.

Roughly speaking, PLTP took an input formula (as a term) and simplified it by repeatedly applying ``symbolic evaluation'' (see EVALUATE in listings-f-and-h), ``rewriting'' (see REWRITE) and ``reduction'' (see REDUCE) until it no longer changed. After simplification, PLTP applied ``fertilization'' (see FERTILIZE), ``generalization'' (see GENRLIZE), and induction (see INDUCT). Fertilization replaced one side of an equality induction hypothesis by the other in certain occurrences in the induction conclusion and ``discarded'' the hypothesis. Generalization replaced some terms by new variables, possibly restricting the values of the new variables to satisfy certain automatically synthesized ``type functions.'' Finally, selection of an induction schema was driven by the failures of symbolic evaluation to expand calls of recursive functions in the conjecture.

We structured the 1973 PLTP somewhat differently than our subsequent NQTHM and ACL2 provers. Let me digress to point out some design choices that readers familiar with our more recent provers may find surprising.

PLTP did not operate on clauses but on terms typically representing conjunctions held together by IFs. That is, all of the subgoals of a proof were expressed in a single term. We didn't introduce clauses, the ``waterfall,'' and the ``pool'' until the prover described in A Computational Logic (Academic Press, 1979).
The PLTP process of repeatedly applying EVALUATE, REWRITE, and REDUCE until the result was stable was called by the non-word ``normalation'' in the early copies of Moore's dissertation and in the 14 September, 1973, source code (see the definition of NORMALATE on page 14 of Listing-F). However, Moore's PhD examining committee (Bernard Meltzer, Rod Burstall, Robin Milner, and David Cooper) asked us to name the process with an English word. We chose ``simplification'' and we defined the function SIMPLIFY in subsequent versions of PLTP. See, for example, page 18 of Listing-A, printed 2 November, 1973.
Unlike NQTHM and ACL2, PLTP's simplifier had no provision for user-suggested lemmas. PLTP heuristically expanded calls of defined functions (using some of the same heuristics used today) and applied rewrite rules about the six primitives, e.g.,
```
(CAR (CONS X Y)) = X

(IF X Y Y) = Y
```
But there was no way to store or use rules derived from previously proved theorems.
In PLTP, symbolic evaluation (EVALUATE) used side-effects to maintain a group of global and locally-bound ``special'' variables to be used by induction. The main such variable was named ANALYSIS and it accumulated ``bomb lists'' recording recursive calls that blocked the symbolic expansion of recursive functions. ANALYSIS was accessed during induction processing by PICKINDVARS.
PLTP could only do structural induction. For example (using POP-2's square bracket notation for s-expressions), PLTP's inductive argument to prove [p X] when X is a linear list was:
```
[AND [p NIL]
     [IMPLIES [p X] [p [CONS X1 X]]]]
```
Our subsequent provers used a schema like this for linear lists:
```
(AND (IMPLIES (NOT (CONSP X)) (p X))
     (IMPLIES (AND (CONSP X)
                   (p (CDR X)))
              (p X)))
```
Obvious generalizations of this latter scheme allow one to provide induction hypotheses about arbitrary objects smaller than X, not just explicit structural components of X.
Finally, PLTP collected fertilization, generalization, and induction into a routine that was facetiously given the name ARTIFINTEL (for ``artificial intelligence'') because:
```
COMMENT 'THIS FUNCTION APPLIES FERTILIZATION AND IF THAT FAILS
TRIES GENERALIZING AND INDUCTING.  IT IS CAREFUL TO WORK ONLY
ON THE FIRST CONJUNCT IF THE THEOREM IS A CONJUNCT.  FOR THIS
IT GETS THE NAME "ARTIFICIAL INTELLIGENCE", BEING ABOUT THE
SMARTEST PROGRAM IN THE THEOREM PROVER.`;
```
In our subsequent provers, in which the goal was represented by the implicit conjunction of clauses in the pool, we generalized this architecture and collected simplification, fertilization, generalization, and (eventually other processes like destructor elimination and irrelevance elimination) into the waterfall with induction at the bottom. Since PLTP's ARTIFINTEL focused generalization and induction on the first conjunct, PLTP acted as though it had a pool and (small) waterfall. Unlike our subsequent provers, PLTP eagerly simplified and fertilized all the formulas in the pool before resorting to generalization, etc.

1973 was a very active time and many different experimental versions of the system were tried, mostly involving minor tweaks to heuristics. To maintain our sanity we had a regression suite of function definitions and theorems, called the proveall, and ran the modified code on the proveall every time we changed the system.

By the end of 1973 we had a reasonably large corpus of Pure Lisp definitions and properties in our proveall regression suite. Among the properties proved fully automatically were that list concatenation is associative, reverse is its own inverse, and insertion sort is correct. The last was stated as two theorems, that insertion sort produces ordered output and that, for all x, the number of times x occurs in the input is equal to the number of times x occurs in the output. Images of lineprinter listings of PLTP's proveall and the proof output are included in the bibliography below. (By the way, the second half of the correctness of insertion sort — the part about SORT preserving the number of occurrences — is not proved in the 18 July, 1973 proveall listing we have, indeed it is marked with a * in the input, indicating that it was a challenge we were working on. It is listed among the successful theorems in the dissertation some months later.)

Here for example is 1973 PLTP proof that SORT produced ORDERED output. The HTML text below was manually transcribed from pages 127-130 of the proveall output of 18 July, 1973. I may have inadventently introduced typos. I have deleted some blank lines.

Definitions of non-primitives used in the proof were printed at the end of the proof log. Note than IMPLIES and AND were non-primitives: our logical connector was the 3-place COND (subsequently renamed IF). ADDTOLIS is the user-defined insertion function used by SORT; LTE is the user-defined Peano ``≤'' operator. The proof below shows PLTP doing two generalizations and three inductions — a straightforward induction on a linear list, a ``special'' induction with two base cases, and an induction on two variables simultaneously. The proof shows PLTP ``discovering'' the lemmas that insertion preserves orderedness and that i ≤ j or j ≤ i.

[T 6 8]  [ 16.22 18 JULY 1973]

THEOREM TO BE PROVED:
[ORDERED [SORT A]]

MUST TRY INDUCTION.

INDUCT ON A.

THE THEOREM TO BE PROVED IS NOW:
[AND [ORDERED [SORT NIL]]
     [IMPLIES [ORDERED [SORT A]] [ORDERED [SORT [CONS A1 A]]]]]

WHICH IS EQUIVALENT TO:
[COND [ORDERED [SORT A]] [ORDERED [ADDTOLIS A1 [SORT A]]] T]

GENERALIZE COMMON SUBTERMS BY REPLACING [SORT A] BY GENRL1.

THE GENERALIZED TERM IS:
[COND [ORDERED GENRL1] [ORDERED [ADDTOLIS A1 GENRL1]] T]

MUST TRY INDUCTION.

(SPECIAL CASE REQUIRED)

INDUCT ON GENRL1.

THE THEOREM TO BE PROVED IS NOW:
[AND
 [COND [ORDERED NIL] [ORDERED [ADDTOLIS A1 NIL]] T]
 [AND
    [COND [ORDERED [CONS GENRL11 NIL]]
    .     [ORDERED [ADDTOLIS A1 [CONS GENRL11 NIL]]]
    .     T]
    [IMPLIES [COND [ORDERED [CONS GENRL12 GENRL1]]
                   [ORDERED [ADDTOLIS A1 [CONS GENRL12 GENRL1]]]
                   T]
             [COND [ORDERED [CONS GENRL11 [CONS GENRL12 GENRL1]]]
                   [ORDERED [ADDTOLIS A1 [CONS GENRL11 [CONS GENRL12 GENRL1]]]]
                   T]]]]

WHICH IS EQUIVALENT TO:

[COND [LTE A1 GENRL11] T [LTE GENRL11 A1]]

MUST TRY INDUCTION.

INDUCT ON GENRL11 AND A1.

THE THEOREM TO BE PROVED IS NOW:
[AND [AND [COND [LTE A1 NIL] T [LTE NIL A1]]
          [COND [LTE NIL GENRL11] T [LTE GENRL11 NIL]]]
     [IMPLIES [COND [LTE A1 GENRL11] T [LTE GENRL11 A1]]
              [COND [LTE [CONS A11 A1] [CONS GENRL111 GENRL11]]
                    T
                    [LTE [CONS GENRL111 GENRL11] [CONS A11 A1]]]]]

WHICH IS EQUIVALENT TO:
T

FUNCTION DEFINITIONS:
[SORT [LAMBDA [X] [COND X [ADDTOLIS [CAR X] [SORT [CDR X]]] NIL]]]

[ORDERED
 [LAMBDA
  [X]
  [COND
      X
      [COND [CDR X] [COND [LTE [CAR X] [CAR [CDR X]]] [ORDERED [CDR X]] NIL] T]
      T]]]

[LTE [LAMBDA [X Y] [COND X [COND Y [LTE [CDR X] [CDR Y]] NIL] T]]]

[ADDTOLIS
 [LAMBDA
    [X Y]
    [COND Y
          [COND [LTE X [CAR Y]] [CONS X Y] [CONS [CAR Y] [ADDTOLIS X [CDR Y]]]]
          [CONS X NIL]]]]

[IMPLIES [LAMBDA [X Y] [COND X [COND Y T NIL] T]]]

[AND [LAMBDA [X Y] [COND X [COND Y T NIL] NIL]]]

GENERALIZATIONS:
GENRL1 = [SORT A]

PROFILE: [/ [A] , / E N R / E N R G S1 [GENRL1] , / E N R / E N R / E N R / E N
R / E N R [GENRL11 A1] , / E N R / E N R .]

TIME: 57.06 SECS.

Our choice of Pure Lisp as the logic in which to conduct our proofs was inspired by John McCarthy's A Basis for a Mathematical Theory of Computation. Indeed, we were so enamored with the idea of LISP as a logic that PLTP used IF to represent the conjunction of subgoals. The style in which we organized PLTP was inspired by similar provers by Bob Boyer's dissertation advisor, W. W. Bledsoe.

We implemented PLTP in POP-2, a imperative Lisp-like programming language with an Algol-like syntax developed at the University of Edinburgh by Robin Popplestone and Rod Burstall. POP-2 was the language supported by the computing platform available to us in 1973: an ICL 4130 processor with 64K bytes of 2 microsecond memory shared between 8 interactive jobs controlled by teletype input from the users. File input and output was by paper tape and lineprinter. The modern reader may have a hard time imagining theorem proving research conducted under such constraints!

We regarded PLTP as an experimental vehicle for exploring inductive proofs. We were the only users and our computing resources were extremely limited. We therefore took many shortcuts that we abandoned as our subsequent provers attracted other users and computing resources grew exponentially. The most egregious shortcut was that the prover did not check that the s-expression provided as input was well-formed. It was just implicit that we'd call the prover only on formulas! There were many such implicit understandings, e.g., no definitional principle was implemented, some variable symbols were implicitly reserved for introduction by the prover, variable symbols beginning with the letters I through N were implicitly treated as numeric, etc. In addition, we inevitably made coding mistakes, some of which we found and fixed and others of which apparently escaped our notice until we assembled this archive! We include a discussion of all ``bugs'' discovered in the 14 September, 1973, listings below. Importantly in this regard, our whole understanding of what constituted a ``bug'' in a theorem prover has evolved in the 45 years from PLTP's origin as a experimental system used by two people to the adoption of theorem provers and proof assistants as trusted industrial tools.

A brief paper about PLTP, Proving Theorems about LISP Functions was presented at the Third International Joint Conference on Artificial Intelligence (IJCAI-73), 20-23 August 1973, at Stanford University. A longer version of the paper, with the same title, subsequently appeared in the JACM in 1975. Proper citations are given below.

PLTP was the beginning of a sequence of Boyer-Moore-Kaufmann provers for applicative Lisp: the unnamed prover described in A Computational Logic, Boyer and Moore, Academic Press, 1979, as well as NQTHM and ACL2.

PLTP itself passed into history.

Reconstructions of PLTP

Around 2014, Grant Passmore emailed Boyer and Moore that he had implemented PLTP in OCaml working entirely from Moore's dissertation and the 1975 JACM paper. Passmore's interest was primarily driven by a desire to understand our induction heuristics so that he could implement induction in the Imandra theorem prover of Aesthetic Integration, Ltd. Grant has kindly permitted us to link to his OCaml implementation of PLTP, which we call PLTP(O), below.

In the spring of 2017, while looking through old boxes of papers (trying to find the documentation for the 77 Editor for which Boyer and I invented the notion of ``piece table'' used by Charles Simonyi in the Xerox PARC text editor Bravo and subsequently in Microsoft Word) we found a complete listing of the source code for PLTP as of 14 September, 1973. We also found various 5-hole paper tapes purporting to contain various files from the era. But absent a 5-hole paper tape reader, our only readable record of PLTP's source code are the faded lineprinter listings from 1973. These include many versions of various component files and one complete listing of all relevant POP-2 source code files for PLTP made on 14 September, 1973. That listing was labeled in 1973 with the handwritten note ``the theorem prover as reported in J's thesis.''

We also found lineprinter listings of the proveall and the proveall output of a version of PLTP predating the dissertation by a couple of months.

In the summer of 2017, working directly from the original POP-2 sources I recoded PLTP in ACL2. However, I took certain liberties: I corrected certain bugs I discovered and refactored some subroutines for clarity without (intentionally) changing their behavior. While any computation can be mimicked in any Turing complete programming language like ACL2, I simply (gratuitously?) balked at trying to mimic exactly the methods of some PLTP routines — memory limitations of the 1970s drove us to short-cuts, convoluted coding style, the use of numerous side-effects to global variables and data structures, and use of the peculiar feature of POP-2 in which one could ``surreptitiously'' pass arguments to future subroutine calls by leaving ``extra'' results on a stack.

In my reconstruction I tended to use more straightforward functional programming idioms, like defining one ACL2 function to compute the results of a given POP-2 routine and another ACL2 function to compute the side-effects. All of this — including the bugs discovered and the ``gratuitous'' stylistic changes — is documented below in great detail.

We call the resulting prover PLTP(A). PLTP(A) focuses on what we call the September core: the proof methods implemented as of 14 September, 1973. It ignores extraneous issues like the command one would have invoked in 1973 to run the regression suite, the code to reproduce exactly the way we prettyprinted formulas in 1973 and the English proof narrative produced by PLTP. I then ``successfully'' ran the 1973 regression with PLTP(A). I quote ``successfully'' because two theorems failed, but I argue in the PLTP(A) material below that PLTP could not have proved these theorems and they were in the suite as challenges to work on.

We also undertook an ultimately unsuccessful effort to get the original POP-2 sources for PLTP to run on a modern machine. While no modern implementation of POP-2 exists (as far as we're aware) Passmore found a binary image for a Russian BESM-6 computer booted with a POP-2 image from the early 1970s and a BESM-6 emulator. We call that system the emulated Russian POP-2 here.

The emulated Russian POP-2 was quite difficult to work with — it is undocumented, fragile, and has extremely limited memory resources (the BESM-6 had even less memory than the ICL 4130 on which PLTP ran). Certain legal POP-2 programs cannot be compiled or are compiled or emulated incorrectly. The compiler silently truncates identifiers to 6 characters (whereas the 1973 Edinburgh compiler truncated to 8), thus causing some invisible name collisions. The compiler also got stack overflows trying to process even modestly nested POP-2 expressions and we had to refactor the code with many jumps and assignments to get it to compile. We use the name PLTP(R) for the PLTP code refactored to compile under the emulated Russian POP-2. But even after apparently successful compilation, PLTP(R) produced runtime errors or unexplained failures on some proof attempts of theorems in the regression.

We ultimately gave up on this attempt since it was clear that, even if we could get the emulated Russian POP-2 to run the regression successfully on some modified version of the code it would not demonstrate how the original source code ran. The ACL2 and OCaml implementations, PLTP(A) and PLTP(O), are better illustrations of the capabilities of PLTP than PLTP(R).

Annotated Bibliography

Moore's dissertation, Part I of which describes our work on structure sharing in resolution theorem proving, including our Prolog-like language Baroque and its Lisp-like counterpart, and Part II of which fully describes PLTP: Computational Logic: Structure Sharing and Proof of Program Properties. PhD Thesis, Department of Computational Logic, University of Edinburgh, 1973.
The IJCAI-73 paper on PLTP: Proving Theorems about LISP Functions, R.S. Boyer and J S. Moore, Third International Joint Conference on Artificial Intelligence (IJCAI-73), Stanford University, 1973, pp. 486-493.
The longer, archival publication of PLTP in JACM: Proving Theorems about LISP Functions, R.S. Boyer and J S. Moore. Journal of the Association for Computing Machinery, 22(1), 1975, pp.129-144. (Note: The last page of the pdf file lists the submission date as `September, 1993.' This is an obvious OCR error; the paper was submitted in September, 1973 and appeared in 1975!)
POP-2 Reference Manual by R.M. Burstall and R.J. Popplestone, Department of Machine Intelligence and Perception, University of Edinburgh, 1972.
POP-2/4100 User's Manual by R. D. Dunn, School of Artificial Intelligence, University of Edinburgh, 1972.
Scanned OCR Images of our 1973 lineprinter listings. These images are faded and sometimes hard to read but are the definitive historical record. Among the listings are all the files in the implementation of PLTP described in Moore's dissertation and the output of a proveall regression run from several months earlier. If you want to see what PLTP did in July, 1973, look at that! Warning: The OCR data in these scans is often incorrect, e.g., 8's coded as B's, 3's etc. Do not trust negative search results over these scans!
A text file containing the POP-2 source code from 14 September, 1973. Listings F and H of the scans contain everything needed to reconstruct the September core of the prover described in Moore's dissertation. Because of the OCR errors in the scans we undertook the manual transcription of these two listings into simple text form. Barring transcription errors (which we have worked hard to eliminate) this is a convenient, machine-readable, listing of the original POP-2 code for PLTP as described in Moore's dissertation.
Bugs and oddities in the 14 September, 1973, code for PLTP. These bugs were noticed in our recent (2017) inspection of the code after 45 years of experience writing theorem provers. These problems do not manifest themselves in any of the published examples of PLTP proofs. Furthermore, inspection of the few source listings with creation dates after 14 September, 1973, reveals that several of these bugs were subsequently noticed and fixed by Boyer and Moore.
Our 2017 re-implementations of PLTP:
- PLTP(A): a recoding of the PLTP sources into the applicative Common Lisp of ACL2.
- PLTP(O): an implementation of PLTP in OCaml based on Moore's dissertation and the 1975 JACM paper.
- PLTP(R): ultimately unsuccessful attempt to get modified PLTP source code to compile and run in the emulated Russian POP-2.
PRETTY-PRINT by R. S. Boyer, Department of Computational Logic, Memo 64, University of Edinburgh, 1972(?) — Boyer's linear-time prettyprint algorithm that produces beautiful and ``exact'' output (in the sense of reliably using the full line width when necessary). Most prettyprinters use heuristics to guess when it is time to print on a new line. PLTP(R) is running Boyer's algorithm. PLTP(A) and PLTP(O) use the standard prettyprinting of their host systems, ACL2 and OCaml. It should be noted that ACL2's prettyprinter is a direct descendant of Boyer's algorithm modified to handle the more complex s-expressions of Common Lisp.