PLTP(A): An ACL2 Implementation of the Edinburgh Pure Lisp Theorem Prover of 1973

J Strother Moore
March, 2018

Abstract

This directory contains a contemporary (2017) implementation of PLTP coded in the applicative (functional) Common Lisp of ACL2. We name this system PLTP(A). PLTP(A) is almost a translation into ACL2 of the 14 September, 1973, listing of the POP-2 for PLTP. We explain the differences below. The material here is organized as follows. First we summarize the main files posted here so you can (i) browse the ACL2 source code for PLTP(A), (ii) inspect PLTP(A)'s standard proveall and JACM proveall, including the output of PLTP(A) on those regression suites, and (iii) read our argument that the two failures in its standard proveall do not indicate a discrepancy between PLTP and PLTP(A). Next, we describe the differences between the original PLTP and PLTP(A). The differences include bug fixes and some ``gratuitous'' changes to make the new code clearer. Finally, we explain how to get ACL2 and how to run PLTP(A).

Browsing PLTP(A)

Note: If you intend to run PLTP(A) under your local ACL2, we advise you to jump to Running PLTP(A) because the PLTP(A) files mentioned below require a certain amount of infrastructure (e.g., the definition of the Common Lisp symbol package named "PLTP"). But if you merely want to see how PLTP(A) is coded in ACL2 and what it can prove, browse here.

ACL2 source code for PLTP(A). PLTP(A) is implemented as a collection of :PROGRAM mode function and macro definitions in ACL2. A brief ``users manual'' is included in the comments at the top of the file.
Standard Proveall obtained by combining the definitions in the [/ DEFS] file of 14 September, 1973, with the [THEOREMS] file of that same date, both of which can be found in Listing I. The definitions and theorems here have been transcribed from POP-2's square-bracket list notation to standard Lisp notation. Furthermore, in the case of theorems using PLTP's numeric variables, additional NUMBERP hypotheses have been added as explained below. The output of PLTP(A) on this proveall (in ACL2 Version 8.0) is given in standard-proveall.output.txt.
JACM Proveall obtained by combining the definitions and theorems from Appendices A and B of Proving Theorems about LISP Functions, R.S. Boyer and J S. Moore. Journal of the Association for Computing Machinery, 22(1), 1975, pp.129-144. Many of the theorems in the JACM paper have the same formulas as theorems in the standard proveall, but some of the definitions in the JACM paper differ from their counterparts in the standard proveall. The output of PLTP(A) on this proveall (in ACL2 Version 8.0) is given in jacm-proveall.output.txt.
A discussion of the failures in the standard proveall. Two of the theorems in the standard proveall cannot be proved by PLTP(A) and have been commented-out of the scripts. In this discussion we argue that those theorems could not possibly have been proved by the 1973 PLTP either and were in the proveall as challenges to work on. Part of this argument involves proving those theorems in ACL2 itself (not PLTP(A)) and those ACL2 proof scripts are included in this discussion.

PLTP v PLTP(A)

The sources for The Edinburgh Pure Lisp Theorem Prover, aka PLTP, as of 14 September, 1973, are in Listing-F and Listing-H, and have been transcribed into text form in listings-f-and-h.txt.

PLTP was written in 1972 and 1973. It was programmed in POP-2, a now dead imperative programming language, and ran on an ICL 4130 processor with 64K bytes of 2 microsecond memory shared between 8 interactive jobs controlled by teletype input from the users. File input and output was by paper tape and lineprinter.

PLTP(A) is an applicative Common Lisp implementation of the PLTP proof techniques and heuristics as of 14 September, 1973. In fact, PLTP(A) is coded in ACL2, a descendent of PLTP. PLTP(A) is not a faithful translation of the 1973 code because the known bugs and oddities of PLTP's original code have been repaired and some of the algorithms have been implemented differently while preserving a high-level equivalency to the original code as discussed further below.

PLTP(A) supports the same basic Pure Lisp as PLTP did, with primitives NIL, CAR, CDR, CONS, EQUAL and IF. (Some versions of PLTP used the symbol COND instead of IF. PLTP(A) uses IF internally but defines PLTP's 3-place COND to be IF and can optionally print IF-expressions as COND-expressions.)

PLTP(A) supports PLTP's abbreviation conventions of T (denoting (CONS NIL NIL)) and natural numbers (denoting lists of NIL). PLTP's term representation left abbreviations in place. E.g., while exploring a term the 1973 theorem prover could have encountered both the number 0 and its formal meaning, NIL. This invited programming errors, as when the 1973 EVAL transformed (CDR n) to n-1, for all natural numbers n, because we (Boyer and Moore) forgot to check that n is not 0, resulting in the transformation of (CDR 0) to the non-term -1.

In contrast, PLTP(A) translates abbreviated terms into their formal correspondents at input so that the core theorem prover only sees formal terms. This not only makes its term exploration paradigms more regular but eliminates a variety of bugs and oddities such as PLTP's failure to always treat T, 1, and (CONS NIL NIL) the same even though all three denote the same term.

PLTP(A) ``untranslates'' before printing so printed terms contain the familiar abbreviations. This can be controlled by various ``feature switches'' set by the user. See the definition of set-feature in the PLTP(A) source code.

POP-2's 8-character limitation on names produced many obscure names, e.g., GENRLT1. I have changed the names of many implementation functions, e.g., GENRLT1 is here called generalizable-subterms and GENRLIZE is now called generalize. When the new name lacks obvious correspondence to the old name (as with GENRLT1), a comment in the code gives the old name.

PLTP(A) does not attempt to reproduce exactly PLTP's English explanations of the proof steps. (The main reason for this is that it requires cluttering the code with output routines and for PLTP(A) I chose to make the generation of output as inconspicuous as possible to focus on the clarity of the term manipulation code.) But PLTP(A) takes the same steps (subject to the caveats below), describes each step in English, and prints the same intermediate formulas (again, subject to caveats below).

No longer concerned with speed and memory footprint, I've freely recoded the algorithms to return the ``same'' results, often by more elegant means. But by ``same'' I mean the proof steps taken by PLTP(A) are the same as those by the 1973 PLTP. Every intermediate formula produced by PLTP(A) logically entails the corresponding formula produced by PLTP; in fact, if one allows for variable renaming and abbreviations, corresponding formulas are usually identical. For example, where PLTP might have generated and printed


[COND [NUMBERP GENRL1] [EQUAL [LENGTH GENRL1] GENRL1] [CONS NIL NIL]]

PLTP(A) might generate and print the variant (due to variable renaming):


(COND (NUMBERP GENRL2) (EQUAL (LENGTH GENRL2) GENRL2) (CONS NIL NIL))

or one of several equivalent abbreviations for that formula depending on the feature switches:


(COND (NUMBERP GENRL2) (EQUAL (LENGTH GENRL2) GENRL2) T)


(IF (NUMBERP GENRL2) (EQUAL (LENGTH GENRL2) GENRL2) T)


(IMPLIES (NUMBERP GENRL2) (EQUAL (LENGTH GENRL2) GENRL2)).

See the definition of set-feature for more explanations.

But I wrote ``usually identical'' above! There are two exceptions to this formula-by-formula identity-modulo-abbreviations: the output of PLTP(A)'s fertilization routine is a generalization of PLTP's, and the output of PLTP(A)'s induction routine is always a list-based inductive formula and never one tailored explicitly to Peano naturals. We discuss these two discrepancies, and some minor ones, below.

The important deviations from the implementation used in PLTP are as follows:

Term Representation: As previously noted, the internal form of terms in PLTP(A) is different: abbreviations have been translated away. This changes the case analysis of virtually every routine that explores terms — and eliminates several bugs.
Type information: PLTP contained code to determine the ``type'' of an expression. In the most general case, it actually created a new function, called the ``type function,'' to describe the output of a given function, fn. Its analysis was context free (insensitive both to the hypotheses governing the occurrence of the call of fn in question and insensitive to the actuals passed to fn) and was based on the type functions of the subfunctions used in the body of fn. It generated type functions on the fly during proofs, whenever it needed type information. It then stored the generated type function on the property list of fn for future use. Thus, PLTP routines that required type information side-effected the logical world.
I chose not to do this in PLTP(A) because it would have meant that every term-processing routine in PLTP(A) that accessed type information would have to return two results, the new term and the (possibly) modified world containing updated type information.
Instead, every time the user introduces a function in PLTP(A), its type function is generated then and there and stored. This makes the prover routines simpler and more valuable as pedagogical illustrations of the term processing techniques.
Because PLTP's type function generator was context free, the type functions generated by PLTP(A) at definition-time are the same as those PLTP generated at proof-time. (The above claim of equivalence is not entirely accurate because of the next deviation!)
Numeric Variable Symbols: PLTP, as of 14 September, 1973, provided the notion of numerically typed variable symbols. Recall that in PLTP's logic, the naturals were constructed, Peano-like, from NIL by successively CONSing NIL onto them. Such lists of NIL were recognized by the predicate NUMBERP.
Variable symbols (which at the time we called Skolem constants, from our days of resolution theorem proving) whose names began with the letters between I and N were understood to be numerically valued. Such symbols were recognized by ISNUMSKO in the POP-2 source code for PLTP. Theorems involving such numeric variables were thought of as implicitly carrying additional hypotheses constraining those variables to satisfy PLTP's NUMBERP predicate.
PLTP's type inference mechanism assumed any numeric variable was numerically valued. Furthermore, it would infer and store the output type of a defined function from its body.
But this meant that if the PLTP user defined, say, the identity function with a numeric formal, PLTP would have stored the type information that its output was numeric. But PLTP did not provide any enforcement of the implied type system!
For example the PLTP user could have written [DEFINE [FOO [LAMBDA [N] N]]] and it would be typed as numeric. Then, using the stored type function, PLTP could have proved the invalid formula (NUMBERP (FOO A)). But nothing would prevent the PLTP user from then calling FOO on a non-number and the result would be that non-number (since FOO here is the identity function), contrary to the theorem just mentioned.
The PLTP user &mdash and there were only two! &mdash could avoid this unsoundness simply by never using numeric variables as formals in definitions. Indeed, no definition in the standard regression suite or the JACM paper used numeric formals, so perhaps we recognized this problem at the time. (It should be understood that PLTP was an experimental research tool for investigations into induction and we tolerated many such implicit preconditions so we could get on with the research.)
Numeric variables were also supported in induction: instead of proving [p N] with its normal simple induction scheme for lists:
```
[AND [p NIL]
     [IMPLIES [p N] [p [CONS N1 N]]]]
```
PLTP used a scheme tailored to numerically typed variables:
```
[AND [p 0]
     [IMPLIES [p N] [p [ADD1 N]]]].
```
aka
```
[AND [p NIL]
     [IMPLIES [p N] [p [CONS NIL N]]]].
```
In creating PLTP(A), I therefore had the choice of either (a) enforcing a type system or (b) removing the whole notion of numeric variables. I chose the latter. PLTP(A) does not treat variables starting with the letters I through N any differently than other variables and it never does a numeric induction.
Theorems in the two PLTP(A) provealls that contained numeric variables have been modified here to carry explicit NUMBERP hypotheses.
Evaluation and Induction: PLTP's EVAL used side effects and about a dozen global variables to accumulate information to be used by the induction heuristic, including a so-called ``bomb list'' for each unexpanded recursive function call recording ``why'' that call could not be expanded. When PLTP needed to symbolically evaluate a term, it called EVAL and ignored the side-effected globals; when PLTP needed to induct, it called EVAL and ignored the result but inspected the final value of the global ANALYSIS which contained the bomb lists.
As with the code for type inference, mimicking this behavior in ACL2's functional setting would have produced hard-to-comprehend code because PLTP(A)'s functional eval would have to return both the simplified term and a modified list of global variables.
Instead, I split PLTP's EVAL into two ACL2 functions, one for computing the result and another for computing the side-effects. The PLTP(A) eval does not accumulate anything on the side, making it easier to see how PLTP did symbolic evaluation.
Then, to implement induction in PLTP(A), I re-construct the value of PLTP's ANALYSIS, with a separate function, called induction-analysis. Induction-analysis makes it clear what information induction uses.
While the 1973 PLTP implementation makes it obvious that induction is driven by the ``failures'' of the symbolic evaluator, PLTP(A)'s implementation makes it easier to understand how symbolic evaluation works and what information induction uses.
Fertilization: ``Fertilization'' was introduced in PLTP and describes the idea of using an equality induction hypothesis by selective substitution and then throwing the hypothesis away. However, in an effort to have our cake and eat it too, PLTP did not actually throw the hypothesis away: it merely hid it. It hid it by introducing a new function symbol of no arguments that was ``defined'' to be the negation of the equality and then disjoining a call of that new function to the formula resulting from throwing away the equality. This meant the output of the fertilization routine was equivalent to the input, which in turn meant that fertilization could freely recur root-and-branch through a formula fertilizing at will.
The problem with this approach is that it is that these new functions are not proper definitions: their bodies involve variables not among their formals. And if one extended the idea by adding those variables as formals to the new symbols, they would block the subsequent inductions that PLTP did.
PLTP(A) avoids the on-the-fly introduction of dubious definitions by deleting the equality after using it, which generalizes the formula. This is what NQTHM and ACL2 do. But this means that PLTP(A) cannot recur freely through the formula looking for equalities to use. It only fertilizes conjuncts of the top-level goal.
No published proof by PLTP required any other form of fertilization.
New Variable Names: As noted in bugs and oddities PLTP made certain implicit assumptions on user input that made it easier to generate ``new'' variable names by just taking on successively higher suffixes. While these implicit assumptions are never violated in the known regression runs, I chose to eliminate that implicit assumption in PLTP(A) and to generate variable names that are guaranteed to be new. This means PLTP(A) sometimes generates different variable names than PLTP did. For example, in the proof of [T 1 4] (page 10 of Listing J) we see PLTP generating the name GENRL1 the first time it generalizes and then generating GENRL2 and GENRL3 the second time. But when PLTP(A) is given the same conjecture to prove, it generates GENRL1 the first time it generalizes and GENRL1 and GENRL2 the second time because the previously generated GENRL1 no longer appears in the conjecture and so can be reused. While this discrepancy could be avoided, doing so needlessly complicates the name generation algorithm because it would require passing around the history of the proof attempt, not just the current goal.
Maximal Common Subterms: PLTP used fairly messy code to find the largest common subterms on opposites sides of equalities and implications; PLTP(A), on the other hand, just computes all the subterms of the two sides, intersects them, and then throws out any subterm contained in another. Whether this algorithm returns the list in the same order — or even whether the original algorithm really returned the alleged list of ``largest common subterms'' — is irrelevant here: PLTP(A)'s algorithm returns a list of the largest common subterms, its correctness is obvious, speed is irrelevant given the small problem size and today's resources, and the order of the subterms shouldn't matter because all of them are then generalized to new variables. If the subterms are listed in a different order then that will affect the names of the new variables assigned to each, giving rise to the ``variable renaming'' caveat above. But that shouldn't affect the proofs.
The ``Waterfall'': I refactored PLTP's main proof routine, PROVE1 to better expose what became the waterfall in our subsequent provers.
PLTP's PROVE1 explicitly called EVALUATE (which is just EVAL after initializing the globals used to accumulate induction analysis), REWRITE, and REDUCE successively and looped until the result stablized. While this process was called ``normalation'' in the 14 September, 1973, PLTP, in subsequent versions of PLTP and our other provers the comparable process was called simplification.
Once the formula was simplified, PLTP called a routine that Boyer and I facetiously named ARTIFINTEL (for ``artificial intelligence'' because according to a comment in the POP-2 code it was THE SMARTEST PROGRAM IN THE THEOREM PROVER because it only works on the first conjunct). ARTIFINTEL fertilized, generalized, and inducted on the first conjunct in the formula it was given. PLTP did not support elimination of destructors or of irrelevance, which were introduced later.
In PLTP(A), I wrap the EVALUATE/REWRITE/REDUCE loop up into a single function named simplify, and I just open-code ARTIFINTEL so fertilize, generalize, and induct are called explicitly by prove1.
PLTP(A) preserves PLTP's use of a single formula to represent the problem. For example, if simplification splits the problem into two subgoals, the result is an IF-expression representing their conjunction and fertilize, etc., just work on one conjunct at a time. It wasn't until NQTHM that Boyer and I started splitting out goals as clauses collected and implicitly conjoined in the pool at the bottom of the waterfall.

In PLTP's defense (not to mention that of Boyer and me!) I should note that the resource limitations of the 1970's ICL 4130 and the POP-2 compiler probably drove many of our implementation choices and coding style. It wasn't until 1989 and ACL2 that we adopted the much cleaner functional programming style. And that style alone does not guarantee clarity. Indeed, ACL2 today contains many less-than-elegant functional algorithms because performance still often demands code optimization despite the enormous improvement in processor speeds and memory sizes because applications have grown so much larger. Indeed, ACL2 supports functions that return a vector of values so that in one sweep of a huge formula many things can be accumulated.

The small sizes of the definitions and conjectures presented to PLTP was wonderfully freeing given today's resources!

The changes in implementation also reflect lessons learned from 45 years of coding provers together with the much more robust ACL2 programming environment I enjoy today.

Running PLTP(A)

PLTP(A) was added to ACL2's Community Books Repository in March, 2018. So if you have a recent ACL2 installation, you already have PLTP(A). Below we tell you how to run it. After that we give a brief summary of the interesting files you'll find on the books/projects/pltpa/ subdirectory of your ACL2 directory, and finally we tell you how to get ACL2 if you don't have it.

To play with PLTP(A), do

(include-book "projects/pltpa/pltpa" :dir :system)

and then select the PLTP symbol package and initialize the PLTP(A) logical world with

(pltpa)

Then you can define PLTP functions and submit challenge theorems as with

(define (append ; define a PLTP function
	 (lambda (x y)
	   (if x (cons (car x) (append (cdr x) y)) y))))

(prove assoc-of-append ; name is irrelevant except for io; theorems are not
                       ; stored in the PLTP world

       (equal (append (append a b) c)
              (append a (append b c))))

For a complete list of available commands, e.g., to inspect PLTP's database, or undo a definition, see the comment at the top of the PLTP(A) source file, which may be found in "books/projects/pltpa/pltpa.lisp" of your local ACL2 installation.

You can find all of the PLTP(A) files on the books/projects/pltpa/ subdirectory of your ACL2 directory.

Among the interesting files after certification of the books there you will find:

pltpa.lisp — the source file for PLTP(A)
pltpa.acl2 — the definition of the PLTP symbol package
standard-proveall.lsp — the PLTP(A) commands in its standard proveall regression suite
chk-standard-proveall.lisp — an ACL2 book that is certifiable iff PLTP(A) passes the standard proveall
chk-standard-proveall.cert.out — PLTP(A)'s output on the standard proveall (see note below)
jacm-proveall.lsp — the PLTP(A) commands in its JACM proveall regression suite
chk-jacm-proveall.lisp — an ACL2 book that is certifiable iff PLTP(A) passes the JACM proveall
chk-jacm-proveall.cert.out — PLTP(A)'s output on the JACM proveall (see note below)

Note on how to read the .cert.out files above: Ordinarily, an ACL2 book can be certified provided the ACL2 prover can successfully prove every theorem in it. So how do we make a book that ACL2 can certify iff PLTP(A) can prove the theorems in its own regression? That involves a certain amount of infrastructure. The two .cert.out files above contain the output of ACL2's certification process. That output starts and ends with ``noise'' related to ACL2's certification infrastructure. PLTP(A)'s output (e.g., on its standard proveall) may be found between the block of text that starts with

:STANDARD Proveall by PLTP(A)

and the block that starts with

Successful :STANDARD Proveall by PLTP(A) completed on

Of course, if you have a running PLTP(A) you can see the output of the provealls by evaluating (proveall :standard) and

(proveall
:jacm)

If you do not already have ACL2 then you must obtain and install it in order to run PLTP(A). This also involves installing one of several available Common Lisp implementations. Follow the installation instructions under the Obtaining, Installing, and License link on the ACL2 Home Page.

The Easy Install for Unix-like Systems instructions will fetch not only the ACL2 system source code but also all of the Community Books. It is easier to fetch them all than to fetch just what you need for PLTP(A). But because PLTP(A) does not depend on any other books, you can skip the general certification instructions (e.g., Step 5 of the Easy Install for Unix-like Systems) if all you want to do is play with PLTP(A). Instead, just build your saved_acl2 runnable image and then certify the PLTP(A) books by connecting to books/projects/pltpa in your local ACL2 installation and running:

../../build/cert.pl --acl2 ../../../saved_acl2 *.lisp