Checksum
Assigning ``often unique'' integers to files and objects
See book-hash for a discussion of how ACL2 can use checksums
(though not by default) to increase security in the books
mechanism.
A checksum is an integer in some fixed range computed from the
printed representation of an object, e.g., the sum, modulo 2**32, of the
ascii codes of all the characters in the printed representation.
Ideally, you would like the checksum of an object to be uniquely associated
with that object, like a fingerprint. It could then be used as a convenient
way to recognize the object in the future: you could remember the
checksum (which is relatively small) and when an object is presented to you
and alleged to be the special one you could compute its checksum and see if
indeed it was. Alas, there are many more objects than checksums (after all,
each checksum is an object, and then there's t). So you try to design a
checksum algorithm that maps similar looking objects far apart, in the hopes
that corruptions and counterfeits — which appear to be similar to the
object — have different checksums. Nevertheless, the best you can do is
a many-to-one map. If an object with a different checksum is presented, you
can be positive it is not the special object. But if an object with the same
checksum is presented, you have no grounds for positive identification.
The basic checksum algorithm in ACL2 is called check-sum-obj, which
computes the checksum of an ACL2 object. Roughly speaking, we scan the print
representation of the object and, for each character encountered, we multiply
the ascii code of the character times its position in the stream (modulo a
certain prime) and then add (modulo a certain prime) that into the running
sum. This is inaccurate in many senses (for example, we don't always use the
ascii code and we see numbers as though they were printed in base 127) but
indicates the basic idea.