Search-engine friendly clone of the ACL2 documentation.

Parser

Check-full-ppnumber

Check that the numerical constant just read is a full preprocessing number.

Signature
(check-full-ppnumber ends-in-e parstate) → (mv erp new-parstate)
Arguments: ends-in-e — Guard (booleanp ends-in-e).; parstate — Guard (parstatep parstate).
Returns: new-parstate — Type (parstatep new-parstate), given (parstatep parstate).

In C, integer and floating constants are not lexed ``directly'' according to their grammar rules in [C17]. First, preprocessing tokens must be recognized, defined by preprocessing-token in [C17:6.4] [C17:A.1.1]. These include preprocessing numbers, defined by pp-number in [C17:6.4.8] [C17:A.1.9], which start with a digit, optionally preceded by a dot, and are followed by identifier characters (including digits and letter), as well as plus and minus signs immediately preceded by exponent letters, as well as periods [C17:6.4.8/2]. Thus, preprocessing numbers lexically include all integer and floating constants [C17:6.4.8/3], and much more, e.g. 638xyyy.e+E-.

As part of translation phase 7 [C17:5.1.1.2], preprocessing tokens are converted to tokens. This includes converting preprocessing numbers to integer and floating constants, checking that they are in fact integer or floating constants, e.g. the example above would not pass the checks.

In our lexer, we lex integer and floating constants directly, but we need to ensure that the behavior is the same as if we had gone through preprocessing numbers. We do that as follow: after fully recognizing an integer or floating constant, we check whether there is a subsequent character of the kind that would be part of a preprocessing number: if there is, it is an error, because the preprocessing number cannot be converted to a constant, due to the extra character(s).

This function performs this check. It is called after an integer or floating constant has been recognized completely during lexing. This function attempts to read the next character, and unless there is no next character, or the next character is one that would not be part of the preprocessing number, an error is returned. In case of success, there is no additional result (it is just a check), except for the possibly updated parser state.

If the next character exists and is a letter or a digit or an underscore or a dot, it would be part of the preprocessing number, so we return an error. Otherwise, the check succeeds, except in one case. The case is that the next character is + or -, but the last character of the integer or floating constant just read (before calling this function) is e or E: in that case, according to the grammar rule of pp-number in [C17], the e+ or e- or -E+ or E- would be part of the preprocessing number, and thus it would cause an error: so the check in this function fails in this case. This function takes a flag saying whether the last character of the read integer or floating constant was e or E; it is passed by the caller, who has read that constant.

Note that, because of the rules on preprocessing numbers, in C this code is syntactically illegal

int x = 0xe+1; // illegal

whereas this code is syntactically legal

int x = 0xf+1; // legal

The reason is that 0xe+1 is a whole preprocessing number, while 0xf+1 is not; the latter is initially lexed as the preprocessing number 0xf followed by the punctuator + followed by the preprocesing number 1. These three can all be successfully converted fron preprocessing tokens to tokens; in particular, 0xf is a valid hexadecimal integer constant. But 0xe+1 is not a hexadecimal (or integer) constant, and so it cannot be converted to one.

Definitions and Theorems

Function: check-full-ppnumber

(defun check-full-ppnumber (ends-in-e parstate)
 (declare (xargs :stobjs (parstate)))
 (declare (xargs :guard (and (booleanp ends-in-e)
                             (parstatep parstate))))
 (let ((__function__ 'check-full-ppnumber))
  (declare (ignorable __function__))
  (b*
   (((reterr) parstate)
    ((erp char pos parstate)
     (read-char parstate))
    ((when (not char)) (retok parstate))
    ((when (or (and (<= (char-code #\A) char)
                    (<= char (char-code #\Z)))
               (and (<= (char-code #\a) char)
                    (<= char (char-code #\a)))
               (and (<= (char-code #\0) char)
                    (<= char (char-code #\9)))
               (= char (char-code #\_))
               (= char (char-code #\.))
               (and ends-in-e
                    (or (= char (char-code #\+))
                        (= char (char-code #\-))))))
     (reterr-msg
      :where (position-to-msg pos)
      :expected
      (msg
       "a character other than ~
                                    a letter ~
                                    or a digit ~
                                    or an underscore ~
                                    or a dot~s0"
       (if ends-in-e " or a plus or a minus"
         ""))
      :found (char-to-msg char)))
    (parstate (unread-char parstate)))
   (retok parstate))))

Theorem: parstatep-of-check-full-ppnumber.new-parstate

(defthm parstatep-of-check-full-ppnumber.new-parstate
  (implies (parstatep parstate)
           (b* (((mv acl2::?erp ?new-parstate)
                 (check-full-ppnumber ends-in-e parstate)))
             (parstatep new-parstate)))
  :rule-classes :rewrite)

Theorem: parsize-of-check-full-ppnumber-uncond

(defthm parsize-of-check-full-ppnumber-uncond
  (b* (((mv acl2::?erp ?new-parstate)
        (check-full-ppnumber ends-in-e parstate)))
    (<= (parsize new-parstate)
        (parsize parstate)))
  :rule-classes :linear)