The first general-purpose automatic theorem prover explicitly designed for the mathematics behind program verification was the Edinburgh Pure Lisp Theorem Prover otherwise known as PLTP. PLTP was created in 1972 and 1973 by Boyer and me while we were in the Department of Computational Logic (formerly, the Metamathematics Unit), School of Artificial Intelligence, University of Edinburgh, Scotland. It was a direct response to the inadequacies of the then-popular resolution-based first-order provers (upon which we had been working) for program verification purposes. The Annotated Bibliography below provides scanned images of the 1973 POP-2 code for PLTP, its input, and its output, as well as links to other relevant material including POP-2 reference manuals, my dissertation, and articles about PLTP published in 1973 and 1975. In addition, we provide two modern (2017) recreations of PLTP, one in ACL2 and the other in OCaml. But first I would like to describe the origins of PLTP and of this archive.
For example,
we could prove that there exists an X
such that the recursively defined
(LENGTH X)
is equal to 3
. But we could not prove
that (LENGTH (APPEND X Y))
is (+ (LENGTH X) (LENGTH Y))
.
The latter conjecture requires mathematical induction.
Our focus on proving properties of programs as a test bed for our efforts in resolution led us by 1972 to the realization that we needed to build a prover capable of induction.
The standard approach to program verification in the early 1970s was to (a) annotate a program with assertions, (b) use one of several methods to generate ``verification condition'' formulas, and then (c) use an ad hoc simplifier or resolution theorem prover to attempt to prove those formulas. It was not uncommon for researchers of the day to add additional ``lemmas'' or specially tailored ``axioms'' to the set of input assumptions in order for the prover to succeed.
The reason for this, as we pointed out in the Preface of our 1979 book A Computational Logic (Academic Press):
One reason that researchers have had to assume ``lemmas'' so freely is that they have not implemented the principle of mathematical induction in their theorem-proving systems. Since mathematical induction is a fundamental rule of inference for the objects about which computer programmers think (e.g., integers, sequences, trees), it is surprising that anyone would implement a theorem-prover for program verification that could not make inductive arguments.We speculated on the possibility that researchers (unconsciously) assumed that the only roles for induction was to unwind loops and handle procedure calls — roles fulfilled by the program semantics. But regardless of why, it is a fact induction was not supported even though many verification conditions involved properties of inductively constructed objects.
Thus 1972 found us proving theorems about recursive list processing functions, by hand on the blackboard, and discussing the techniques and heuristics we were using. The main questions were
These exercises soon led us to implement our methods in the Edinburgh Pure
Lisp Theorem Prover, PLTP. Briefly put, PLTP was a fully automatic theorem
prover for properties of recursive functions over list structures. Since our
focus was on induction, we limited the language to a homegrown version of
Pure Lisp supporting just the object
NIL
and ordered pairs constructed by CONS
. The
language included CAR
and CDR
, a
3-place IF
(actually named ``COND
'' in early work),
and EQUAL
. Natural numbers were just abbreviations for lists
of NIL
s giving rise to Peano-like definitions of arithmetic.
The user could define recursive functions and challenge the prover to prove
properties. It was not possible for the user to offer hints, lemmas, or
other help.
In this setting we could explore automatic proofs of the most elementary results of elementary number theory as well as properties sequences and trees and of functions that mixed those theories, e.g., we could pursue the kinds of proofs required by program verification.
Roughly speaking, PLTP took an input formula (as a term)
and simplified it by repeatedly applying ``symbolic evaluation''
(see EVALUATE
in
listings-f-and-h), ``rewriting''
(see REWRITE
) and ``reduction'' (see REDUCE
) until
it no longer changed. After
simplification, PLTP applied ``fertilization'' (see FERTILIZE
),
``generalization'' (see GENRLIZE
), and induction (see INDUCT
).
Fertilization replaced one side of an equality induction hypothesis by the
other in certain occurrences in the induction conclusion and ``discarded''
the hypothesis. Generalization replaced some terms by new variables,
possibly restricting the values of the new variables to satisfy certain automatically synthesized
``type functions.'' Finally, selection of an induction schema was driven by
the failures of symbolic evaluation to expand calls of recursive functions in
the conjecture.
We structured the 1973 PLTP somewhat differently than our subsequent NQTHM and ACL2 provers. Let me digress to point out some design choices that readers familiar with our more recent provers may find surprising.
IF
s. That is, all of the subgoals of a proof were expressed in a single term. We didn't introduce
clauses, the ``waterfall,'' and the ``pool'' until the prover described in A Computational Logic (Academic Press, 1979).
EVALUATE
, REWRITE
, and REDUCE
until the result was stable was called by the non-word ``normalation'' in the
early copies of Moore's dissertation and in the 14 September, 1973, source
code (see the definition of NORMALATE
on page 14
of Listing-F). However,
Moore's PhD examining committee (Bernard Meltzer, Rod Burstall, Robin Milner,
and David Cooper) asked us to name the process with an English word. We
chose ``simplification'' and we defined the function SIMPLIFY
in
subsequent versions of PLTP. See, for example, page 18
of Listing-A, printed 2
November, 1973.
(CAR (CONS X Y)) = X (IF X Y Y) = YBut there was no way to store or use rules derived from previously proved theorems.
EVALUATE
) used side-effects to
maintain a group of global and locally-bound ``special'' variables to be used
by induction. The main such variable was named ANALYSIS
and it
accumulated ``bomb lists'' recording recursive calls that blocked the
symbolic expansion of recursive functions. ANALYSIS
was
accessed during induction processing by PICKINDVARS
.
[p X]
when X
is a linear list was:
[AND [p NIL] [IMPLIES [p X] [p [CONS X1 X]]]]Our subsequent provers used a schema like this for linear lists:
(AND (IMPLIES (NOT (CONSP X)) (p X)) (IMPLIES (AND (CONSP X) (p (CDR X))) (p X)))Obvious generalizations of this latter scheme allow one to provide induction hypotheses about arbitrary objects smaller than
X
, not just
explicit structural components of X
.
ARTIFINTEL
(for
``artificial intelligence'') because:
COMMENT 'THIS FUNCTION APPLIES FERTILIZATION AND IF THAT FAILS TRIES GENERALIZING AND INDUCTING. IT IS CAREFUL TO WORK ONLY ON THE FIRST CONJUNCT IF THE THEOREM IS A CONJUNCT. FOR THIS IT GETS THE NAME "ARTIFICIAL INTELLIGENCE", BEING ABOUT THE SMARTEST PROGRAM IN THE THEOREM PROVER.`;In our subsequent provers, in which the goal was represented by the implicit conjunction of clauses in the pool, we generalized this architecture and collected simplification, fertilization, generalization, and (eventually other processes like destructor elimination and irrelevance elimination) into the waterfall with induction at the bottom. Since PLTP's
ARTIFINTEL
focused generalization and induction on the first
conjunct, PLTP acted as though it had a pool and (small) waterfall. Unlike
our subsequent provers, PLTP eagerly simplified and fertilized all the
formulas in the pool before resorting to generalization, etc.
1973 was a very active time and many different experimental versions of the system were tried, mostly involving minor tweaks to heuristics. To maintain our sanity we had a regression suite of function definitions and theorems, called the proveall, and ran the modified code on the proveall every time we changed the system.
By the end of 1973 we had a reasonably large corpus of Pure Lisp definitions
and properties in our proveall regression suite. Among the properties proved
fully automatically were that list concatenation is associative, reverse is
its own inverse, and insertion sort is correct. The last was stated as two
theorems, that insertion sort produces ordered output and that, for
all x, the number of times x occurs in the input is equal to
the number of times x occurs in the output. Images of lineprinter
listings of PLTP's proveall and the proof output are included in the
bibliography below. (By the way, the second half of the correctness of
insertion sort — the part about SORT
preserving the number
of occurrences — is not proved in the 18 July, 1973 proveall listing we
have, indeed it is marked with a *
in the input, indicating that
it was a challenge we were working on. It is listed among the successful
theorems in the dissertation some months later.)
Here for example is 1973 PLTP proof that SORT
produced ORDERED
output.
The HTML text below was manually transcribed from pages 127-130 of the
proveall output of 18 July,
1973. I may have inadventently introduced typos. I have deleted some
blank lines.
Definitions of non-primitives used in the proof were printed at the end of
the proof log. Note than IMPLIES
and AND
were
non-primitives: our logical connector was the 3-place COND
(subsequently renamed IF
). ADDTOLIS
is the
user-defined insertion function used by
SORT
; LTE
is the user-defined Peano ``≤''
operator. The proof below shows PLTP doing two generalizations and three
inductions — a straightforward induction on a linear list, a
``special'' induction with two base cases, and an induction on two variables
simultaneously. The proof shows PLTP ``discovering'' the lemmas that
insertion preserves orderedness and that
i ≤ j or j ≤ i.
[T 6 8] [ 16.22 18 JULY 1973] THEOREM TO BE PROVED: [ORDERED [SORT A]] MUST TRY INDUCTION. INDUCT ON A. THE THEOREM TO BE PROVED IS NOW: [AND [ORDERED [SORT NIL]] [IMPLIES [ORDERED [SORT A]] [ORDERED [SORT [CONS A1 A]]]]] WHICH IS EQUIVALENT TO: [COND [ORDERED [SORT A]] [ORDERED [ADDTOLIS A1 [SORT A]]] T] GENERALIZE COMMON SUBTERMS BY REPLACING [SORT A] BY GENRL1. THE GENERALIZED TERM IS: [COND [ORDERED GENRL1] [ORDERED [ADDTOLIS A1 GENRL1]] T] MUST TRY INDUCTION. (SPECIAL CASE REQUIRED) INDUCT ON GENRL1. THE THEOREM TO BE PROVED IS NOW: [AND [COND [ORDERED NIL] [ORDERED [ADDTOLIS A1 NIL]] T] [AND [COND [ORDERED [CONS GENRL11 NIL]] . [ORDERED [ADDTOLIS A1 [CONS GENRL11 NIL]]] . T] [IMPLIES [COND [ORDERED [CONS GENRL12 GENRL1]] [ORDERED [ADDTOLIS A1 [CONS GENRL12 GENRL1]]] T] [COND [ORDERED [CONS GENRL11 [CONS GENRL12 GENRL1]]] [ORDERED [ADDTOLIS A1 [CONS GENRL11 [CONS GENRL12 GENRL1]]]] T]]]] WHICH IS EQUIVALENT TO: [COND [LTE A1 GENRL11] T [LTE GENRL11 A1]] MUST TRY INDUCTION. INDUCT ON GENRL11 AND A1. THE THEOREM TO BE PROVED IS NOW: [AND [AND [COND [LTE A1 NIL] T [LTE NIL A1]] [COND [LTE NIL GENRL11] T [LTE GENRL11 NIL]]] [IMPLIES [COND [LTE A1 GENRL11] T [LTE GENRL11 A1]] [COND [LTE [CONS A11 A1] [CONS GENRL111 GENRL11]] T [LTE [CONS GENRL111 GENRL11] [CONS A11 A1]]]]] WHICH IS EQUIVALENT TO: T FUNCTION DEFINITIONS: [SORT [LAMBDA [X] [COND X [ADDTOLIS [CAR X] [SORT [CDR X]]] NIL]]] [ORDERED [LAMBDA [X] [COND X [COND [CDR X] [COND [LTE [CAR X] [CAR [CDR X]]] [ORDERED [CDR X]] NIL] T] T]]] [LTE [LAMBDA [X Y] [COND X [COND Y [LTE [CDR X] [CDR Y]] NIL] T]]] [ADDTOLIS [LAMBDA [X Y] [COND Y [COND [LTE X [CAR Y]] [CONS X Y] [CONS [CAR Y] [ADDTOLIS X [CDR Y]]]] [CONS X NIL]]]] [IMPLIES [LAMBDA [X Y] [COND X [COND Y T NIL] T]]] [AND [LAMBDA [X Y] [COND X [COND Y T NIL] NIL]]] GENERALIZATIONS: GENRL1 = [SORT A] PROFILE: [/ [A] , / E N R / E N R G S1 [GENRL1] , / E N R / E N R / E N R / E N R / E N R [GENRL11 A1] , / E N R / E N R .] TIME: 57.06 SECS.
Our choice of Pure Lisp as the logic in which to conduct our proofs was
inspired by John
McCarthy's A Basis for a
Mathematical Theory of Computation. Indeed, we were so enamored with the
idea of LISP as a logic that PLTP used IF
to represent the
conjunction of subgoals. The style in which we organized PLTP was inspired
by similar provers by Bob Boyer's dissertation advisor, W. W. Bledsoe.
We implemented PLTP in POP-2, a imperative Lisp-like programming language with an Algol-like syntax developed at the University of Edinburgh by Robin Popplestone and Rod Burstall. POP-2 was the language supported by the computing platform available to us in 1973: an ICL 4130 processor with 64K bytes of 2 microsecond memory shared between 8 interactive jobs controlled by teletype input from the users. File input and output was by paper tape and lineprinter. The modern reader may have a hard time imagining theorem proving research conducted under such constraints!
We regarded PLTP as an experimental vehicle for exploring inductive proofs.
We were the only users and our computing resources were extremely limited.
We therefore took many shortcuts that we abandoned as our subsequent provers
attracted other users and computing resources grew exponentially. The most
egregious shortcut was that the prover did not check that the s-expression
provided as input was well-formed. It was just implicit that we'd call the
prover only on formulas! There were many such implicit understandings, e.g.,
no definitional principle was implemented, some variable symbols were
implicitly reserved for introduction by the prover, variable symbols
beginning with the letters I
through N
were
implicitly treated as numeric, etc. In addition, we inevitably made coding
mistakes, some of which we found and fixed and others of which apparently
escaped our notice until we assembled this archive! We include a discussion
of all ``bugs'' discovered in the 14 September, 1973, listings below.
Importantly in this regard, our whole understanding of what constituted a
``bug'' in a theorem prover has evolved in the 45 years from PLTP's origin as
a experimental system used by two people to the adoption of theorem provers
and proof assistants as trusted industrial tools.
A brief paper about PLTP, Proving Theorems about LISP Functions was presented at the Third International Joint Conference on Artificial Intelligence (IJCAI-73), 20-23 August 1973, at Stanford University. A longer version of the paper, with the same title, subsequently appeared in the JACM in 1975. Proper citations are given below.
PLTP was the beginning of a sequence of Boyer-Moore-Kaufmann provers for applicative Lisp: the unnamed prover described in A Computational Logic, Boyer and Moore, Academic Press, 1979, as well as NQTHM and ACL2.
PLTP itself passed into history.
In the spring of 2017, while looking through old boxes of papers (trying to find the documentation for the 77 Editor for which Boyer and I invented the notion of ``piece table'' used by Charles Simonyi in the Xerox PARC text editor Bravo and subsequently in Microsoft Word) we found a complete listing of the source code for PLTP as of 14 September, 1973. We also found various 5-hole paper tapes purporting to contain various files from the era. But absent a 5-hole paper tape reader, our only readable record of PLTP's source code are the faded lineprinter listings from 1973. These include many versions of various component files and one complete listing of all relevant POP-2 source code files for PLTP made on 14 September, 1973. That listing was labeled in 1973 with the handwritten note ``the theorem prover as reported in J's thesis.''
We also found lineprinter listings of the proveall and the proveall output of a version of PLTP predating the dissertation by a couple of months.
In the summer of 2017, working directly from the original POP-2 sources I recoded PLTP in ACL2. However, I took certain liberties: I corrected certain bugs I discovered and refactored some subroutines for clarity without (intentionally) changing their behavior. While any computation can be mimicked in any Turing complete programming language like ACL2, I simply (gratuitously?) balked at trying to mimic exactly the methods of some PLTP routines — memory limitations of the 1970s drove us to short-cuts, convoluted coding style, the use of numerous side-effects to global variables and data structures, and use of the peculiar feature of POP-2 in which one could ``surreptitiously'' pass arguments to future subroutine calls by leaving ``extra'' results on a stack.
In my reconstruction I tended to use more straightforward functional programming idioms, like defining one ACL2 function to compute the results of a given POP-2 routine and another ACL2 function to compute the side-effects. All of this — including the bugs discovered and the ``gratuitous'' stylistic changes — is documented below in great detail.
We call the resulting prover PLTP(A). PLTP(A) focuses on what we call the September core: the proof methods implemented as of 14 September, 1973. It ignores extraneous issues like the command one would have invoked in 1973 to run the regression suite, the code to reproduce exactly the way we prettyprinted formulas in 1973 and the English proof narrative produced by PLTP. I then ``successfully'' ran the 1973 regression with PLTP(A). I quote ``successfully'' because two theorems failed, but I argue in the PLTP(A) material below that PLTP could not have proved these theorems and they were in the suite as challenges to work on.
We also undertook an ultimately unsuccessful effort to get the original POP-2 sources for PLTP to run on a modern machine. While no modern implementation of POP-2 exists (as far as we're aware) Passmore found a binary image for a Russian BESM-6 computer booted with a POP-2 image from the early 1970s and a BESM-6 emulator. We call that system the emulated Russian POP-2 here.
The emulated Russian POP-2 was quite difficult to work with — it is undocumented, fragile, and has extremely limited memory resources (the BESM-6 had even less memory than the ICL 4130 on which PLTP ran). Certain legal POP-2 programs cannot be compiled or are compiled or emulated incorrectly. The compiler silently truncates identifiers to 6 characters (whereas the 1973 Edinburgh compiler truncated to 8), thus causing some invisible name collisions. The compiler also got stack overflows trying to process even modestly nested POP-2 expressions and we had to refactor the code with many jumps and assignments to get it to compile. We use the name PLTP(R) for the PLTP code refactored to compile under the emulated Russian POP-2. But even after apparently successful compilation, PLTP(R) produced runtime errors or unexplained failures on some proof attempts of theorems in the regression.
We ultimately gave up on this attempt since it was clear that, even if we could get the emulated Russian POP-2 to run the regression successfully on some modified version of the code it would not demonstrate how the original source code ran. The ACL2 and OCaml implementations, PLTP(A) and PLTP(O), are better illustrations of the capabilities of PLTP than PLTP(R).