Hash cons, function memoization, and applicative hash tables
This topic describes the hash cons, function memoization, and applicative hash tables features available in ACL2, sometimes called the ``hons-enabled'' features.
Bob Boyer and Warren Hunt, and later Jared Davis and Sol Swords, have developed a canonical representation for ACL2 data objects and a function memoization mechanism to facilitate reuse of previously computed results. This facility includes procedures to read and print ACL2 expressions in such a way that repetition of some ACL2 objects is eliminated, thereby permitting a kind of on-the-fly file compression. The implementation does not alter the semantics of ACL2 except to add a handful of definitions.
We historically gave the name ``ACL2(h)'' to the experimental extension of the ACL2 system including hash cons, function memoization, and fast association lists (applicative hash tables). These features, which we call the ``hons-enabled'' features, are now present in ACL2. The hons-enabled features are optimized for Clozure Common Lisp (CCL) and to some extent, GNU Common Lisp (GCL ANSI); but they are also supported in every ACL2 build.
Power users who want to take advantage of the hons-enabled features
of ACL2 might find it helpful to consult the document
Much of the documentation for the remainder of this topic is taken from the paper ``Function Memoization and Unique Object Representation for ACL2 Functions'' by Robert S. Boyer and Warren A. Hunt, Jr., which has appeared in the Sixth International Workshop on the ACL2 Theorem Prover and Its Applications, ACM Digital Library, 2006.
In the implementation of the ACL2 logic, ACL2 data objects are represented
by Common Lisp objects of the same type, and the ACL2 pairing operation is
internally implemented by the Common Lisp cons function. In Common
Lisp,
It appears that hash consing was first conceived by A. P. Ershov in 1957, to speed up the recognition of common subexpressions. Ershov showed how to collapse trees to minimal DAGs by traversing trees bottom up, and he used hashing to eliminate the re-evaluation of common subexpressions. In his 1973 PhD dissertation L. Peter Deutsch describes a program verifier that uses hash cons to represent terms and his rewriter operated on hash consed terms. Later, Eiichi Goto implemented a Lisp system with a built-in hash consing operation: his h-CONS cells were rewrite protected and free of duplicate copies, and Goto used this hash consing operation to facilitate the implementation of a symbolic algebra system he developed.
Memoizing functions also has a long history. In 1967, Donald Michie proposed using memoized functions to improve the performance of machine learning. Rote learning was improved by a learning function not forgetting what it had previously learned; this information was stored as memoized function values.
The use of hash consing has appeared many times. For instance, Henry Baker used hash consing to improve the performance of the well-known Boyer rewriting benchmark. Baker used both hash consing and function memoization to improve the speed of the Takeuchi function, exactly in the spirit of our implementation, but without the automated, system-wide integration we report here.
The ACL2 implementation permits memoization of user-defined functions. During execution a user may enable or disable function memoization on an individual function basis, may clear memoization tables, and may even keep a stack of memoization tables. This facility takes advantage of our implementation where we keep one copy of each distinct ACL2 data object. Due to the functional nature of ACL2, it is sufficient to have at most one copy of any data structure; thus, a user may arrange to keep data canonicalized. This implementation extends to the entire ACL2 system the benefits enjoyed by BDDs: canonicalization, memoization, and fast equality check.
We have defined various algorithms using these features, and we have observed, in some cases, substantial performance increases. For instance, we have implemented unordered set intersection and union operations that operate in time roughly linear in the size of the sets. Without using arrays, we defined a canonical representation for Boolean functions using ACL2 objects. We have investigated the performance of rewriting and tree consensus algorithms to good effect, and we believe function memoization offers interesting opportunities to simplify algorithm definition while simultaneously providing performance improvements.
We recommend that users focus at first on the logical definitions of hons and other primitives rather than their underlying Common Lisp implementations. Integrated with this memoization system is a performance monitoring system, which can provide real-time feedback on the operation and usefulness of hons and function memoization. For a more detailed description of these tools, please see the ACL2 2006 workshop paper mentioned above.
The Fibonacci function is a small example that demonstrates the utility of function memoization. The following definition exhibits a runtime that is exponential in its input argument.
(defun fib (x) (declare (xargs :guard (natp x))) (mbe :logic (cond ((zp x) 0) ((= x 1) 1) (t (+ (fib (- x 1)) (fib (- x 2))))) :exec (if (< x 2) x (+ (fib (- x 1)) (fib (- x 2))))))
Below we show how the ACL2 time$ utility can measure the time to
execute a call to the
ACL2 !>(time$ (fib 40)) ; (EV-REC *RETURN-LAST-ARG3* ...) took ; 0.99 seconds realtime, 0.98 seconds runtime ; (1,296 bytes allocated). 102334155 ACL2 !>(memoize 'fib) Summary Form: ( TABLE MEMOIZE-TABLE ...) Rules: NIL Time: 0.01 seconds (prove: 0.00, print: 0.00, other: 0.01) Summary Form: ( PROGN (TABLE MEMOIZE-TABLE ...) ...) Rules: NIL Time: 0.01 seconds (prove: 0.00, print: 0.00, other: 0.01) FIB ACL2 !>(time$ (fib 40)) ; (EV-REC *RETURN-LAST-ARG3* ...) took ; 0.00 seconds realtime, 0.00 seconds runtime ; (2,864 bytes allocated). 102334155 ACL2 !>(time$ (fib 100)) ; (EV-REC *RETURN-LAST-ARG3* ...) took ; 0.00 seconds realtime, 0.00 seconds runtime ; (7,024 bytes allocated). 354224848179261915075 ACL2 !>(unmemoize 'fib)
We see that once the function
The implementation of hash consing, memoization, and applicative hash tables involves several facets: canonical representation of ACL2 data, function memoization, and the use of Lisp hash tables to improve the performance of association list operations. We discuss each of these in turn, and we mention some subtle interrelationships. Although it is not necessary to understand the discussion in this section, it may permit some users to better use this implementation. This discussion may be confusing for some ACL2 users as it makes references to Lisp implementation features.
The ACL2 system is actually implemented as a Lisp program that is layered on top of a Common Lisp system implementation. ACL2 data constants are implemented with their corresponding counterparts in Common Lisp; that is, ACL2 cons pairs, strings, characters, numbers, and symbols are implemented with their specific Common Lisp counterparts. This choice permits a number of ACL2 primitive functions to be implemented with their corresponding Common Lisp functions, but there are indeed differences. ACL2 is a logic, and as such, it does not specify anything to do with physical storage or execution performance. When ACL2 is used, the knowledgeable user can write functions to facilitate the reuse of some previously computed values.
Recall the three mechanisms under discussion: hash consing, function
memoization, and fast association list operations. The function memoization
mechanism takes advantage of the canonical representation of data objects
provided by the hons operation as does the fast association list
operation mechanism. Each time
The ACL2 universe is recursively closed under the
The definition of hons in no way changes the operation of
User-defined functions with defined and verified guards can be memoized.
When a function is memoized, a user-supplied condition restricts the domain
when memoization is attempted. When the condition is satisfied, memoization
is attempted (assuming that memoization for the function is presently
enabled); otherwise, the function is just evaluated. Each memoized function
has a hash table that is used to keep track of a unique list of function
arguments paired with their values. If appropriate, for each function the
corresponding table is checked to see if a previous call with exactly the same
arguments already exists in the table: if so, then the associated value is
returned; if not, then the function is evaluated and a new key-value pair is
added to the table of memoized values so that some future call will benefit
from the memoization. With ACL2 user functions memoization can be dynamically
enabled and disabled. There is an ACL2 function that clears a specific
memoization table. And finally, just as with the
We next discuss the fast lookup operation for association lists. When a pair is added to an association list using the functions hons-acons or hons-acons!, the system also records the key-value pair in an associated hash table. As long as a user only used these two functions when placing key-value pairs on an association list, the key-value pairs in the corresponding hash table will be synchronized with the key-value pairs in the association list. Later, if hons-get is used to look up a key, then instead of performing a linear search of the association list we consult the associated hash table. If a user does not strictly follow this discipline, then a linear search may be required. In some sense, these association lists are much like ACL2 arrays, but without the burden of explicitly naming the arrays. The ACL2 array compress1 function is analogous to the functions fast-alist-clean and fast-alist-clean!. There are user-level ACL2 functions that allow the associated hash tables to be cleared and removed.
As mentioned above, the hons-enabled features are optimized for CCL and GCL. See ccl-updates for easy instructions for obtaining the latest version of CCL.
REFERENCES
Baker, Henry G., The Boyer Benchmark at Warp Speed. ACM Lisp Pointers V,3 (Jul-Sep 1992), pages 13-14.
Baker, Henry G., A Tachy 'TAK'. ACM Lisp Pointers Volume 3, July-September, 1992, pages 22-23.
Robert S. Boyer and Warren A. Hunt, Jr., Function Memoization and Unique Object Representation for ACL2 Functions, in the Sixth International Workshop on the ACL2 Theorem Prover and Its Applications, ACM Digital Library, 2006.
L.P. Deutsch. An Interactive Program Verifier. Tech. Rept. CSL-73-1, Xerox Palo Alto Research Center, May, 1973.
A. P. Ershov. On Programming of Arithmetic Operations. In the Communications of the ACM, Volume 118, Number 3, August, 1958, pages 427-430.
Eiichi Goto, Monocopy and Associative Algorithms in Extended Lisp, TR-74-03, University of Tokyo, 1974.
Richard P. Gabriel. Performance and Evaluation of Lisp Systems. MIT Press, 1985.
Donald Michie. Memo functions: a Language Feature with Rote Learning Properties. Technical Report MIP-R-29, Department of Artificial Intelligence, University of Edinburgh, Scotland, 1967.
Donald Michie. Memo Functions and Machine Learning. In Nature, Volume 218, 1968, pages 19-22.