Arrays
ACL2 arrays and operations on them
Below we begin a detailed presentation of ACL2 arrays. ACL2's
single-threaded objects (see stobj) provide a similar functionality
that is generally more efficient when there are updates (writes), but is also
more restrictive.
See arrays-example for a brief introduction illustrating the use of
ACL2 arrays.
ACL2 provides relatively efficient 1- and 2-dimensional arrays. Arrays are
awkward to provide efficiently in an applicative language because the
programmer rightly expects to be able to ``modify'' an array object with the
effect of changing the behavior of the element accessing function on that
object. This, of course, does not make any sense in an applicative setting.
The element accessing function is, after all, a function, and its behavior on
a given object is immutable. To ``modify'' an array object in an applicative
setting we must actually produce a new array object. Arranging for this to be
done efficiently is a challenge to the implementors of the language. In
addition, the programmer accustomed to the von Neumann view of arrays must
learn how to use immutable applicative arrays efficiently.
In this note we explain 1-dimensional arrays. In particular, we explain
briefly how to create, access, and ``modify'' them, how they are implemented,
and how to program with them. 2-dimensional arrays are dealt with by
analogy.
The Logical Description of ACL2 Arrays
An ACL2 1-dimensional array is an object that associates arbitrary objects
with certain integers, called ``indices.'' Every array has a dimension,
dim, which is a positive integer. The indices of an array are the
consecutive integers from 0 through dim-1. To obtain the object
associated with the index i in an array a, one uses (aref1 name a
i). Name is a symbol that is irrelevant to the semantics of aref1 but affects the speed with which it computes. We will talk more about
array ``names'' later. To produce a new array object that is like a but
which associates val with index i, one uses (aset1 name a i
val).
An ACL2 1-dimensional array is actually an alist. There is no special ACL2
function for creating arrays; they are generally built with the standard list
processing functions list and cons. However, there is a
special ACL2 function, called compress1, for speeding up access to the
elements of such an alist. We discuss compress1 later.
One element of the alist must be the ``header'' of the array. The header of a 1-dimensional array with dimension dim is of the form:
(:HEADER :DIMENSIONS (dim)
:MAXIMUM-LENGTH max
:DEFAULT obj ; optional
:NAME name ; optional
:ORDER order ; optional values are < (the default), >, or :none/nil
).
Obj may be any object and is called the ``default value'' of the
array. Max must be an integer greater than dim. Name must
be a symbol. The :default and :name entries are optional;
if :default is omitted, the default value is nil. The
function header, when given a name and a 1- or 2-dimensional array,
returns the header of the array. The functions dimensions,
maximum-length, and default are similar and return the
corresponding fields of the header of the array. The role of the
:dimensions field is obvious: it specifies the legal indices into
the array. The roles played by the :maximum-length and
:default fields are described below.
Aside from the header, the other elements of the alist must each be
of the form (i . val), where i is an integer and 0 <= i < dim,
and val is an arbitrary object.
The :order field of the header is ignored for 2-dimensional arrays.
For 1-dimensional arrays, it specifies the order of keys (i, above) when
the array is compressed as with compress1, as described below. An
:order of :none or nil specifies no reordering of the alist by
compress1, and an order of > specifies reordering by compress1 so that keys are in descending order. Otherwise, the alist is
reordered by compress1 so that keys are in ascending order.
(Aref1 name a i) is guarded so that name must be a symbol,
a must be an array and i must be an index into a. The value of
(aref1 name a i) is either (cdr (assoc i a)) or else is the default
value of a, depending on whether there is a pair in a whose car is i. Note that name is irrelevant to the value of an aref1 expression. You might :pe aref1 to see how simple the definition
is.
(Aset1 name a i val) is guarded analogously to the aref1 expression. The value of the aset1 expression is essentially
(cons (cons i val) a). Again, name is irrelevant. Note (aset1
name a i val) is an array, a', with the property that (aref1 name a'
i) is val and, except for index i, all other indices into a'
produce the same value as in a. Note also that if a is viewed as an
alist (which it is) the pair ``binding'' i to its old value is in a'
but ``covered up'' by the new pair. Thus, the length of an array grows by one
when aset1 is done.
Because aset1 covers old values with new ones, an array produced by
a sequence of aset1 calls may have many irrelevant pairs in it. The
function compress1 can remove these irrelevant pairs. Thus,
(compress1 name a) returns an array that is equivalent (vis-a-vis aref1) to a but which may be shorter. For technical reasons, the alist
returned by compress1 may also list the pairs in a different order
than listed in a.
To prevent arrays from growing excessively long due to repeated aset1 operations, aset1 essentially calls compress1 on the
new alist whenever the length of the new alist exceeds the :maximum-length entry, max, in the header of the array. See
the definition of aset1 (for example by using :pe). This
is primarily just a mechanism for freeing up cons space consumed while
doing aset1 operations. Note however that this compress1 call
is replaced by a hard error if the header specifies an :order of
:none or nil.
This completes the logical description of 1-dimensional arrays.
2-dimensional arrays are analogous. The :dimensions entry of the
header of a 2-dimensional array should be (dim1 dim2). A pair of
indices, i and j, is legal iff 0 <= i < dim1 and 0 <= j <
dim2. The :maximum-length must be greater than dim1*dim2.
Aref2, aset2, and compress2 are like their
counterparts but take an additional index argument. Finally, the pairs
in a 2-dimensional array are of the form ((i . j) . val).
The Implementation of ACL2 Arrays
Very informally speaking, the function compress1 ``creates'' an
ACL2 array that provides fast access, while the function aref1
``maintains'' fast access. We now describe this informal idea more
carefully.
Aref1 is essentially assoc. If aref1 were
implemented naively the time taken to access an array element would be linear
in the dimension of the array and the number of ``assignments'' to it (the
number of aset1 calls done to create the array from the initial
alist). This is intolerable; arrays are ``supposed'' to provide constant-time
access and change.
The apparently irrelevant names associated with ACL2 arrays allow us to
provide constant-time access and change when arrays are used in
``conventional'' ways. The implementation of arrays makes it clear what we
mean by ``conventional.''
Recall that array names are symbols. Behind the scenes, ACL2 associates
two objects with each ACL2 array name. The first object is called the
``semantic value'' of the name and is an alist. The second object is called
the ``raw lisp array'' and is a Common Lisp array.
When (compress1 name alist) builds a new alist, a', it sets the
semantic value of name to that new alist. Furthermore, it writes into a
Common Lisp array all of the index/value pairs of a', initializing
unassigned indices with the default value. In general this is a new array,
which becomes the raw lisp array of name. However, if a raw lisp array
is already associated with name and is at least as long as the dimension
specified in the header, then that array is reused and all indices out
of range are ignored. (Such reuse can be avoided; see flush-compress
for how to remove the existing association of a raw lisp array with a name.)
Either way, compress1 then returns a', the semantic value, as its
result, as required by the definition of compress1.
When (aref1 name a i) is invoked, aref1 first determines
whether the semantic value of name is a (i.e., is eq to the
alist a). If so, aref1 can determine the ith element of
a by invoking Common Lisp's aref function on the raw lisp array
associated with name. Note that no linear search of the alist a is
required; the operation is done in constant time and involves retrieval of two
global variables, an eq test and jump, and a raw lisp array
access. In fact, an ACL2 array access of this sort is about 5 times slower
than a C array access. On the other hand, if name has no semantic value
or if it is different from a, then aref1 determines the answer by
linear search of a as suggested by the assoc-like definition of
aref1. Thus, aref1 always returns the axiomatically specified
result. It returns in constant time if the array being accessed is the
current semantic value of the name used. The ramifications of this are
discussed after we deal with aset1.
When (aset1 name a i val) is invoked, aset1 does two conses to create the new array. Call that array a'. It will be
returned as the answer. (In this discussion we ignore the case in which
aset1 does a compress1.) However, before returning, aset1 determines if name's semantic value is a. If so, it makes
the new semantic value of name be a' and it smashes the raw lisp
array of name with val at index i, before returning a' as
the result. Thus, after doing an aset1 and obtaining a new semantic
value a', all aref1s on that new array will be fast. Any aref1s on the old semantic value, a, will be slow.
To understand the performance implications of this design, consider the
chronological sequence in which ACL2 (Common Lisp) evaluates expressions:
basically inner-most first, left-to-right, call-by-value. An array use, such
as (aref1 name a i), is ``fast'' (constant-time) if the alist supplied,
a, is the value returned by the most recently executed compress1
or aset1 on the name supplied. In the functional expression of
``conventional'' array processing, all uses of an array are fast.
The :name field of the header of an array is completely
irrelevant. Our convention is to store in that field the symbol we mean to
use as the name of the raw lisp array. But no ACL2 function inspects
:name and its primary value is that it allows the user, by inspecting the
semantic value of the array — the alist — to recall the name of
the raw array that probably holds that value. We say ``probably'' since there
is no enforcement that the alist was compressed under the name in the header or that all asets used that name. Such enforcement would be
inefficient.
Some Programming Examples
In the following examples we will use ACL2 ``global variables'' to hold
several arrays. See @, and see assign.
Let the state global variable a be the 1-dimensional
compressed array of dimension 5 constructed below.
ACL2 !>(assign a (compress1 'demo
'((:header :dimensions (5)
:maximum-length 15
:default uninitialized
:name demo)
(0 . zero))))
Then (aref1 'demo (@ a) 0) is zero and (aref1 'demo (@ a) 1)
is uninitialized.
Now execute
ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))
Then (aref1 'demo (@ b) 0) is zero and (aref1 'demo (@ b) 1)
is one.
All of the aref1s done so far have been ``fast.''
Note that we now have two array objects, one in the global variable a
and one in the global variable b. B was obtained by assigning to
a. That assignment does not affect the alist a because this is an
applicative language. Thus, (aref1 'demo (@ a) 1) must still be
uninitialized. And if you execute that expression in ACL2 you will see
that indeed it is. However, a rather ugly comment is printed, namely that
this array access is ``slow.'' The reason it is slow is that the raw lisp
array associated with the name demo is the array we are calling b.
To access the elements of a, aref1 must now do a linear search.
Any reference to a as an array is now ``unconventional;'' in a
conventional language like Ada or Common Lisp it would simply be impossible to
refer to the value of the array before the assignment that produced our
b.
Now let us define a function that counts how many times a given object,
x, occurs in an array. For simplicity, we will pass in the name and
highest index of the array:
ACL2 !>(defun cnt (name a i x)
(declare (xargs :guard
(and (array1p name a)
(integerp i)
(>= i -1)
(< i (car (dimensions name a))))
:mode :logic
:measure (nfix (+ 1 i))))
(cond ((zp (1+ i)) 0) ; return 0 if i is at most -1
((equal x (aref1 name a i))
(1+ (cnt name a (1- i) x)))
(t (cnt name a (1- i) x))))
To determine how many times zero appears in (@ b) we can
execute:
ACL2 !>(cnt 'demo (@ b) 4 'zero)
The answer is 1. How many times does uninitialized appear in
(@ b)?
ACL2 !>(cnt 'demo (@ b) 4 'uninitialized)
The answer is 3, because positions 2, 3 and 4 of the
array contain that default value.
Now imagine that we want to assign 'two to index 2 and then count
how many times the 2nd element of the array occurs in the array. This
specification is actually ambiguous. In assigning to b we produce a new
array, which we might call c. Do we mean to count the occurrences in
c of the 2nd element of b or the 2nd element of c? That is, do
we count the occurrences of uninitialized or the occurrences of two?
If we mean the former the correct answer is 2 (positions 3 and
4 are uninitialized in c); if we mean the latter, the correct
answer is 1 (there is only one occurrence of two in c).
Below are ACL2 renderings of the two meanings, which we call [former]
and [latter]. (Warning: Our description of these examples, and of an
example [fast former] that follows, assumes that only one of these three
examples is actually executed; for example, they are not executed in sequence.
See ``A Word of Warning'' below for more about this issue.)
(cnt 'demo (aset1 'demo (@ b) 2 'two) 4 (aref1 'demo (@ b) 2)) ; [former]
(let ((c (aset1 'demo (@ b) 2 'two))) ; [latter]
(cnt 'demo c 4 (aref1 'demo c 2)))
Note that in [former] we create c in the second argument of the
call to cnt (although we do not give it a name) and then refer to b
in the fourth argument. This is unconventional because the second reference
to b in [former] is no longer the semantic value of demo.
While ACL2 computes the correct answer, namely 2, the execution of the
aref1 expression in [former] is done slowly.
A conventional rendering with the same meaning is
(let ((x (aref1 'demo (@ b) 2))) ; [fast former]
(cnt 'demo (aset1 'demo (@ b) 2 'two) 4 x))
which fetches the 2nd element of b before creating c by
assignment. It is important to understand that [former] and [fast
former] mean exactly the same thing: both count the number of occurrences of
uninitialized in c. Both are legal ACL2 and both compute the same
answer, 2. Indeed, we can symbolically transform [fast former] into
[former] merely by substituting the binding of x for x in the
body of the let. But [fast former] can be evaluated faster than
[former] because all of the references to demo use the then-current
semantic value of demo, which is b in the first line and c
throughout the execution of the cnt in the second line. [Fast
former] is the preferred form, both because of its execution speed and its
clarity. If you were writing in a conventional language you would have to
write something like [fast former] because there is no way to refer to
the 2nd element of the old value of b after smashing b unless it had
been saved first.
We turn now to [latter]. It is both clear and efficient. It creates
c by assignment to b and then it fetches the 2nd element of c,
two, and proceeds to count the number of occurrences in c. The
answer is 1. [Latter] is a good example of typical ACL2 array
manipulation: after the assignment to b that creates c, c is
used throughout.
It takes a while to get used to this because most of us have grown
accustomed to the peculiar semantics of arrays in conventional languages. For
example, in raw lisp we might have written something like the following,
treating b as a ``global variable'':
(cnt 'demo (aset 'demo b 2 'two) 4 (aref 'demo b 2))
which sort of resembles [former] but actually has the semantics of
[latter] because the b from which aref fetches the 2nd element
is not the same b used in the aset! The array b is destroyed
by the aset and b henceforth refers to the array produced by the
aset, as written more clearly in [latter].
A Word of Warning: Users must exercise care when experimenting with
[former], [latter] and [fast former]. Suppose you have just
created b with the assignment shown above,
ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))
If you then evaluate [former] in ACL2 it will complain that the aref1 is slow and compute the answer, as discussed. Then suppose you
evaluate [latter] in ACL2. From our discussion you might expect it to
execute fast — i.e., issue no complaint. But in fact you will find that
it complains repeatedly. The problem is that the evaluation of [former]
changed the semantic value of demo so that it is no longer b. To
try the experiment correctly you must make b be the semantic value of
demo again before the next example is evaluated. One way to do that is
to execute
ACL2 !>(assign b (compress1 'demo (@ b)))
before each expression. Because of issues like this it is often hard to
experiment with ACL2 arrays at the top-level. We find it easier to write
functions that use arrays correctly and efficiently than to so use them
interactively.
This last assignment also illustrates a very common use of compress1. While it was introduced as a means of removing irrelevant pairs
from an array built up by repeated assignments, it is actually most useful as
a way of insuring fast access to the elements of an array.
Many array processing tasks can be divided into two parts. During the
first part the array is built. During the second part the array is used
extensively but not modified. If your programming task can be so
divided, it might be appropriate to construct the array entirely with list
processing, thereby saving the cost of maintaining the semantic value of the
name while few references are being made. Once the alist has stabilized, it
might be worthwhile to treat it as an array by calling compress1,
thereby gaining constant time access to it.
ACL2's theorem prover uses this technique in connection with its
implementation of the notion of whether a rune is disabled or
not. Associated with every rune is a unique integer index, called
its ``nume.'' When each rule is stored, the corresponding nume is stored as a
component of the rule. Theories are lists of runes and
membership in the ``current theory'' indicates that the corresponding rule is
enabled. But these lists are very long and membership is a linear-time
operation. So just before a proof begins we map the list of runes in
the current theory into an alist that pairs the corresponding numes with
t. Then we compress this alist into an array. Thus, given a rule we can
obtain its nume (because it is a component) and then determine in constant
time whether it is enabled. The array is never modified during the
proof, i.e., aset1 is never used in this example. From the logical
perspective this code looks quite odd: we have replaced a linear-time
membership test with an apparently linear-time assoc after going to
the trouble of mapping from a list of runes to an alist of numes. But
because the alist of numes is an array, the ``apparently linear-time assoc'' is more apparent than real; the operation is constant-time.
Subtopics
- Slow-array-warning
- A warning or error issued when arrays are used inefficiently
- Compress1
- Remove irrelevant pairs from a 1-dimensional array
- Aset1
- Set the elements of a 1-dimensional array
- Aref1
- Access the elements of a 1-dimensional array
- Flush-compress
- Flush the under-the-hood array for the given name
- Aset2
- Set the elements of a 2-dimensional array
- Compress2
- Remove irrelevant pairs from a 2-dimensional array
- Header
- Return the header of a 1- or 2-dimensional array
- Aref2
- Access the elements of a 2-dimensional array
- Maximum-length
- Return the :maximum-length from the header of an array
- Dimensions
- Return the :dimensions from the header of a 1- or
2-dimensional array
- Default
- Return the :default from the header of a 1- or
2-dimensional array
- Aset1-trusted
- Set the elements of a 1-dimensional array without invariant-risk
- Arrays-example
- An example illustrating ACL2 arrays
- Array2p
- Recognize a 2-dimensional array
- Array1p
- Recognize a 1-dimensional array
- Maybe-flush-and-compress1
- Compress a one-dimensional array only if necessary