Search-engine friendly clone of the ACL2 documentation.

Atc-tutorial

Atc-tutorial-identifiers

ATC tutorial: ACL2 representation of C identifiers.

(This page may be skipped at first reading.)

As mentioned in atc-tutorial-int-programs, C function and variable names are represented by ACL2 symbols whose names are the C function and variable names. This tutorial page explains in more detail which and how C identifiers are represented in ACL2.

[C17:5.2.1] describes two (possibly identical) character sets: one in which the C code is written (the source character set), and one whose character values are manipulated by the C code (the execution character set). C identifiers consist of characters from the source set.

The source and execution character sets may be anything, so long as they satisfy the requirements stated in [C17:5.2.1]; they do not have to be ASCII or Unicode or anything specific. However, they are required to contain certain characters that have obvious ASCII counterparts. It also seems that, given the prevalence of (supersets of) ASCII, most C implementations will indeed use (some superset of) ASCII for both the source and execution character set. This certainly appears to be the case on platforms like macOS.

Based on the above considerations, and the fact that the ACL2 character set is the ISO 8859-1 superset of ASCII, ATC assumes that the C implementation supports a superset of ASCII, and generates code that is entirely in ASCII, for greater portability. This means that the generated C identifiers, in particular, must consist of ASCII characters.

ATC generates C code with portable ASCII identifiers, i.e. identifiers that are both ASCII and portable. This notion is described and motivated in portable-ascii-identifiers. It is also stated in Section `Portable ASCII C Identifiers' of the ATC reference documentation. Portable ASCII identifiers are represented, in ACL2, by symbols whose symbol-names are exactly the C identifiers. Since, as mentioned above, ACL2 characters are a superset of ASCII, any portable ASCII C identifier may be represented by some ACL2 symbol. The symbol-package-names are ignored for this purpose: different ACL2 symbols with the same name but different package name represent the same C identifier (provided that their names are legal portable ASCII C identifiers).

The Lisp platform on which ACL2 runs is typically case-insensitive, in the sense that symbols may be written in uppercase or lowercase, and either way their names are uppercase, e.g. both (symbol-name 'abc) and (symbol-name 'ABC) yield the string "ABC". However, enclosing a symbol in vertical bars makes it case-sensitive, e.g. (symbol-name '|abc|) yields the string "abc".

In ACL2, function and variable symbols are normally written without vertical bars and with dashes to separate words. Since dashes are illegal in C identifiers, underscores should be used to separate words in ACL2 function and variable symbols that represent C function and variable names. As explained above, the absence of vertical bars would result in C identifiers with uppercase letters, which goes against typical C conventions. Therefore, it is best to use vertical bars around these symbols. Examples are |x|, |next_value|, |var_k43|, etc. In any case, any portable ASCII C identifiers, including ones with uppercase letters, are representable via ACL2 symbols.

Previous: ACL2 representation and generation of C int programs

Next: ACL2 representation and generation of multiple C functions