Portable ASCII identifiers.
A portable ASCII identifier is a non-empty sequence of ASCII characters that:
The rationale behind this notion is the following.
[C:6.4.2] allows the following characters in identifiers:
The characters in (1) and (2) are part of the basic source character set. The characters in (3) are presumably not part of the basic source character set, because they are non-ASCII Unicode characters, while the basic source character set maps naturally to ASCII Unicode characters; we say `presumably' because [C] does not seem to explicitly prohibit the use of non-ASCII Unicode characters in the basic source character set (although it seems odd to do that). The characters in (4) may or may not be part of the basic source character set; [C] imposes no constraints on them.
Recall that [C] does not require the basic source character set to consist of ASCII characters (see source-character-set). If it does consist of ASCII characters, then the characters in (1) and (2) above are ASCII, the characters in (3) are not ASCII, and the characters in (4) may or may not be ASCII.
In our formalization of C, we currently assume that source characters include ASCII characters and the basic source character set consists of ASCII characters. Our model of identifiers uses ACL2 strings, which are a superset of the ASCII characters.
Thus, in order for our identifiers to be both ASCII and portable, they must consist exclusively of characters from (1) and (2) above, without any characters from (3) or (4). If they included characters from (3), they would not be ASCII. If they included characters from (4), they would not be portable, because those additional characters may vary across implementations.
This leads to our definition of portably ASCII identifiers, which consists of ASCII letters and digits and underscore, with the first character not a digit. They must also be distinct from keywords [C:6.4.2.1/4].