Major Section: ACL2 Documentation
Bob Boyer and Warren Hunt, and later Jared Davis and Sol Swords, have developed a canonical representation for ACL2 data objects and a function memoization mechanism to facilitate reuse of previously computed results. This facility includes procedures to read and print ACL2 expressions in such a way that repetition of some ACL2 objects is eliminated, thereby permitting a kind of on-the-fly file compression. The implementation does not alter the semantics of ACL2 except to add a handful of definitions.
We give the name ``ACL2(h)'' to the resulting experimental extension of the ACL2 system, which includes hash cons, function memoization, and fast association lists (applicative hash tables). It is optimized for Clozure Common Lisp (CCL), but other ANSI-compliant host Lisp implementations may also work, provided there are sufficient machine resources.
If you want to use ACL2(h), you might find it helpful to consult the document
centaur/README.html
in the ACL2 community books.
An easy way is to build ACL2(h) is to include the following with a `make' command:
ACL2_HONS=hSo for example, to make an executable image and also documentation (which will appear in subdirectories
doc/EMACS
and doc/HTML
):
make large DOC ACL2_HONS=h
(
memoize-summary
)
(cons-subtrees x nil)
builds a fast alist that associates each subtree
of X with T, without duplication.
(fast-alist-free alist)
throws away the hash table associated with a fast
alist.
(fast-alist-len alist)
counts the number of unique keys in a fast alist.
(fast-alist-summary)
prints some basic statistics about any current fast
alists.
(hons x y)
returns a normed object equal to (cons x y)
.
(hons-acons key val alist)
is the main way to create or extend
fast-alists.
(hons-acons! key val alist)
is an alternative to hons-acons
that
produces normed, fast alists.
(hons-assoc-equal key alist)
is not fast; it serves as the logical
definition for hons-get
.
(hons-clear gc)
is a drastic garbage collection mechanism that clears out
the underlying Hons Space.
(hons-copy x)
returns a normed object that is equal to X.
(hons-copy-persistent x)
returns a normed object that is equal to X
and which will be re-normed after any calls to hons-clear
.
(hons-equal x y)
is a recursive equality check that optimizes when parts of
its arguments are normed.
(hons-equal-lite x y)
is a non-recursive equality check that optimizes if
its arguments are normed.
(hons-get key alist)
is the efficient lookup operation for
fast-alists.
(hons-resize ...)
can be used to manually adjust the sizes of the hash
tables that govern which ACL2 Objects are considered normed.
(hons-shrink-alist alist ans)
can be used to eliminate "shadowed pairs"
from an alist or to copy fast-alists.
(hons-shrink-alist! alist ans)
is an alternative to hons-shrink-alist
that produces a normed result.
(hons-summary)
prints basic information about the sizes of the tables in
the current Hons Space.
(hons-wash)
is like gc$
but can also garbage collect normed
objects (CCL Only).
(make-fast-alist alist)
creates a fast-alist from the input alist,
returning alist
itself or, in some cases, a new object equal to it.
certain sense.
(number-subtrees x)
returns the number of distinct subtrees of X, in the
sense of equal
fast-alists
are used inefficiently
(with-fast-alist name form)
causes name
to be a fast alist for the
execution of form
.
(with-stolen-alist name form)
ensures that name
is a fast alist at the
start of the execution of form
. At the end of execution, it ensures that
name
is a fast alist if and only if it was originally. That is, if
name
was not a fast alist originally, its hash table link is freed, and if
it was a fast alist originally but its table was modified during the execution
of form
, that table is restored. Note that any extended table created from
the original fast alist during form
must be manually freed.
Much of the documentation for the remainder of this topic is taken from the paper ``Function Memoization and Unique Object Representation for ACL2 Functions'' by Robert S. Boyer and Warren A. Hunt, Jr., which has appeared in the Sixth International Workshop on the ACL2 Theorem Prover and Its Applications, ACM Digital Library, 2006.
In the implementation of the ACL2 logic, ACL2 data objects are represented by
Common Lisp objects of the same type, and the ACL2 pairing operation is
internally implemented by the Common Lisp cons
function. In Common
Lisp, cons
is guaranteed to provide a new pair, distinct from any
previously created pair. We have defined a new ACL2 function HONS
that
is logically identical to the ACL2 cons
function, but whose
implementation usually reuses an existing pair if its components are
identical to the components of an existing pair. A record of ACL2 HONS
objects is kept, and when an ACL2 function calls hons
ACL2 searches for
an existing identical pair before allocating a new pair; this operation been
called ``hash consing''.
It appears that hash consing was first conceived by A. P. Ershov in 1957, to speed up the recognition of common subexpressions. Ershov showed how to collapse trees to minimal DAGs by traversing trees bottom up, and he used hashing to eliminate the re-evaluation of common subexpressions. Later, Eiichi Goto implemented a Lisp system with a built-in hash consing operation: his h-CONS cells were rewrite protected and free of duplicate copies, and Goto used this hash consing operation to facilitate the implementation of a symbolic algebra system he developed.
Memoizing functions also has a long history. In 1967, Donald Michie proposed using memoized functions to improve the performance of machine learning. Rote learning was improved by a learning function not forgetting what it had previously learned; this information was stored as memoized function values.
The use of hash consing has appeared many times. For instance, Henry Baker used hash consing to improve the performance of the well-known Boyer rewriting benchmark. Baker used both hash consing and function memoization to improve the speed of the Takeuchi function, exactly in the spirit of our implementation, but without the automated, system-wide integration we report here.
The ACL2 implementation permits memoization of user-defined functions. During execution a user may enable or disable function memoization on an individual function basis, may clear memoization tables, and may even keep a stack of memoization tables. This facility takes advantage of our implementation where we keep one copy of each distinct ACL2 data object. Due to the functional nature of ACL2, it is sufficient to have at most one copy of any data structure; thus, a user may arrange to keep data canonicalized. This implementation extends to the entire ACL2 system the benefits enjoyed by BDDs: canonicalization, memoization, and fast equality check.
We have defined various algorithms using these features, and we have observed, in some cases, substantial performance increases. For instance, we have implemented unordered set intersection and union operations that operate in time roughly linear in the size of the sets. Without using arrays, we defined a canonical representation for Boolean functions using ACL2 objects. We have investigated the performance of rewriting and tree consensus algorithms to good effect, and we believe function memoization offers interesting opportunities to simplify algorithm definition while simultaneously providing performance improvements.
We recommend that users focus at first on the logical definitions of
hons
and other primitives rather than their underlying Common Lisp
implementations. Integrated with this memoization system is a performance
monitoring system, which can provide real-time feedback on the operation and
usefulness of hons
and function memoization. For a more detailed
description of these tools, please see the ACL2 2006 workshop paper mentioned
above.
The Fibonacci function is a small example that demonstrates the utility of function memoization. The following definition exhibits a runtime that is exponential in its input argument.
(defun fib (x) (declare (xargs :guard (natp x))) (mbe :logic (cond ((zp x) 0) ((= x 1) 1) (t (+ (fib (- x 1)) (fib (- x 2))))) :exec (if (< x 2) x (+ (fib (- x 1)) (fib (- x 2))))))
Below we show how the ACL2 time$
utility can measure the time to
execute a call to the fib
function (with some editing to avoid noise from
the underlying Common Lisp implementation). Memoize
is actually an
ACL2 macro that expands to a call of the ACL2 function used to identify a
function for memoization; see memoize. This function also accepts a
well-formed term that must be true in order for the system to memoize a call
of the memoized function; to ensure that an instance of the term is safe for
evaluation in Common Lisp, a check is performed that if the guard of the
memoized function is satisfied, then this instance will execute without
error.
ACL2 !>(time$ (fib 40)) ; (EV-REC *RETURN-LAST-ARG3* ...) took ; 0.99 seconds realtime, 0.98 seconds runtime ; (1,296 bytes allocated). 102334155 ACL2 !>(memoize 'fib) Summary Form: ( TABLE MEMOIZE-TABLE ...) Rules: NIL Time: 0.01 seconds (prove: 0.00, print: 0.00, other: 0.01) Summary Form: ( PROGN (TABLE MEMOIZE-TABLE ...) ...) Rules: NIL Time: 0.01 seconds (prove: 0.00, print: 0.00, other: 0.01) FIB ACL2 !>(time$ (fib 40)) ; (EV-REC *RETURN-LAST-ARG3* ...) took ; 0.00 seconds realtime, 0.00 seconds runtime ; (2,864 bytes allocated). 102334155 ACL2 !>(time$ (fib 100)) ; (EV-REC *RETURN-LAST-ARG3* ...) took ; 0.00 seconds realtime, 0.00 seconds runtime ; (7,024 bytes allocated). 354224848179261915075 ACL2 !>(unmemoize 'fib)
We see that once the function fib
is identified as a function for which
function calls should be memoized, the execution times are substantially
reduced. Finally, we can prevent fib
from being further memoized; in
fact, unmemoize
erases the previously memoized values.
The implementation of hash consing, memoization, and applicative hash tables involves several facets: canonical representation of ACL2 data, function memoization, and the use of Lisp hash tables to improve the performance of association list operations. We discuss each of these in turn, and we mention some subtle interrelationships. Although it is not necessary to understand the discussion in this section, it may permit some users to better use this implementation. This discussion may be confusing for some ACL2 users as it makes references to Lisp implementation features.
The ACL2 system is actually implemented as a Lisp program that is layered on top of a Common Lisp system implementation. ACL2 data constants are implemented with their corresponding counterparts in Common Lisp; that is, ACL2 cons pairs, strings, characters, numbers, and symbols are implemented with their specific Common Lisp counterparts. This choice permits a number of ACL2 primitive functions to be implemented with their corresponding Common Lisp functions, but there are indeed differences. ACL2 is a logic, and as such, it does not specify anything to do with physical storage or execution performance. When ACL2 is used, the knowledgeable user can write functions to facilitate the reuse of some previously computed values.
Recall the three mechanisms under discussion: hash consing, function
memoization, and fast association list operations. The function memoization
mechanism takes advantage of the canonical representation of data objects
provided by the hons
operation as does the fast association list
operation mechanism. Each time hons
is invoked, its arguments are
themselves converted, if necessary, to uniquely represented objects.
The ACL2 universe is recursively closed under the cons
pairing operation
and the atoms. Hash consing (hons
) is logically identical to cons
,
but a set of tables is used to record each hons
pair. When a hons
pair is requested, the implementation checks, in the current set of tables,
whether the requested pair already exists. If not, a new pair is created and
a record of that pair is made; otherwise, a reference to the previously
created pair is returned. Thus, any data object created with hons
has a
unique representation, as does every subcomponent. We also arrange for
strings to have a unique representation -- only one copy of each different
string is kept -- and when any previously unseen string is an argument to
hons
, we add the string to a unique table of strings. A system-wide
benefit of having a canonical representation for data is that equality
comparisons between any two data objects can be done in constant time.
The definition of hons
in no way changes the operation of cons
--
hons
creates a cons
pair. When asked to create a hons
, the
implementation checks to see if there is a cons
with the same car
and cdr
as the hons
being requested. Thus, the only difference
between the results of a hons
call and a corresponding cons
call is a
notation in some invisible (to the ACL2 logic) tables where we record which
cons
elements are also hons
elements. Since a hons
is nothing
but a cons
, the operation of car
and cdr
is unchanged. In fact,
the implementation is designed so that at any time it is safe to clear the
table identifying which cons
elements are also considered hons
elements.
User-defined functions with defined and verified guards can be memoized.
When a function is memoized, a user-supplied condition restricts the domain
when memoization is attempted. When the condition is satisfied, memoization
is attempted (assuming that memoization for the function is presently
enabled); otherwise, the function is just evaluated. Each memoized function
has a hash table that is used to keep track of a unique list of function
arguments paired with their values. If appropriate, for each function the
corresponding table is checked to see if a previous call with exactly the
same arguments already exists in the table: if so, then the associated value
is returned; if not, then the function is evaluated and a new key-value pair
is added to the table of memoized values so that some future call will
benefit from the memoization. With ACL2 user functions memoization can be
dynamically enabled and disabled. There is an ACL2 function that clears a
specific memoization table. And finally, just as with the hons
table, a
stack of these function memoization tables is maintained; that is, it is
possible to memoize a function, use it a bit, set the memoized values aside,
start a new table, use it, and then return to the original table.
We next discuss the fast lookup operation for association lists. When a pair
is added to an association list using the functions hons-acons
or
hons-acons!
, the system also records the key-value pair in an
associated hash table. As long as a user only used these two functions when
placing key-value pairs on an association list, the key-value pairs in the
corresponding hash table will be synchronized with the key-value pairs in the
association list. Later, if hons-get
is used to look up a key, then
instead of performing a linear search of the association list we consult the
associated hash table. If a user does not strictly follow this discipline,
then a linear search may be required. In some sense, these association lists
are much like ACL2 arrays, but without the burden of explicitly naming the
arrays. The ACL2 array compress1
function is analogous to the
functions hons-shrink-alist
and hons-shrink-alist!
. There are
user-level ACL2 functions that allow the associated hash tables to be cleared
and removed.
Finally, we would advise anyone who is using CCL in a research environment to
stay plugged into the ``trunk'' or ``bleeding edge'' of CCL development.
This is very easy to do by typing a few commands to a shell, for example
standing above the target directory as follows, provided one has svn
working.
For linux: rm -rf ccl svn co http://svn.clozure.com/publicsvn/openmcl/trunk/linuxx8664/ccl For an x86 Macintosh running the Darwin OS: svn co http://svn.clozure.com/publicsvn/openmcl/trunk/darwinx8664/ccl To keep up to date, you may find it sufficient to do: cd ccl svn update Whether obtaining a fresh CCL or just updating, finally issue these commands. ./lx86cl64 (rebuild-ccl :full t) (quit) ./lx86cl64 (rebuild-ccl :full t) (quit)
REFERENCES
Baker, Henry G., The Boyer Benchmark at Warp Speed. ACM Lisp Pointers V,3 (Jul-Sep 1992), pages 13-14.
Baker, Henry G., A Tachy 'TAK'. ACM Lisp Pointers Volume 3, July-September, 1992, pages 22-23.
Robert S. Boyer and Warren A. Hunt, Jr., Function Memoization and Unique Object Representation for ACL2 Functions, in the Sixth International Workshop on the ACL2 Theorem Prover and Its Applications, ACM Digital Library, 2006.
A. P. Ershov. On Programming of Arithmetic Operations. In the Communications of the ACM, Volume 118, Number 3, August, 1958, pages 427-430.
Eiichi Goto, Monocopy and Associative Algorithms in Extended Lisp, TR-74-03, University of Toyko, 1974.
Richard P. Gabriel. Performance and Evaluation of Lisp Systems. MIT Press, 1985.
Donald Michie. Memo functions: a Language Feature with Rote Learning Properties. Technical Report MIP-R-29, Department of Artificial Intelligence, University of Edinburgh, Scotland, 1967.
Donald Michie. Memo Functions and Machine Learning. In Nature, Volumne 218, 1968, pages 19-22.