Read a character.
(read-char pstate) → (mv erp char? pos new-pstate)
Besides the character, we also return its position. No character is returned if we are at the end of the file; in this case, the returned file position is the one of the non-existent character just past the end of the file (i.e. the position of a character if we added it to the end of the file).
If a character is read, it is added to the list of read characters. See parstate.
If some character was put back earlier, we get the character directly from there, removing it from the list; see parstate. There is no change in position in the parser state in this case, because the position is the one at the start of the byte list. Otherwise, we look at the bytes.
If there are no more bytes, we have reached the end of file.
We return
Otherwise, we read the first byte, which is removed from the parser state. Since currently we only support ASCII, one byte is always enough for a character; in the future, we may generalize this to perform UTF-8 decoding.
Looking at the rules in the ABNF grammar for basic and extended characters, we see that the ASCII codes of the three non-new-line extended characters (namely dollar, at sign, and backquote) fill gaps in the ASCII codes of the basic characters, so that the codes 9, 11, 12, and 32-126 are all valid characters, which are thus returned, incrementing the position by one column. If instead the byte is the ASCII code 10 for line feed, we increment the position by one line. If instead the byte is the ASCII code 13 for carriage return, there are two cases based on whether the immediately following byte exists and is the ASCII code 10 for line feed: if it does, we consume two bytes instead of one, but we return just a line feed, since we only really need one new-line character (also in line with [C:5.2.1/3]); if it does not, we just consume the carriage return, but return a line feed, again for normalizing the new-line character. In both cases, we increment the position by one line, which also resets the column to 0 (see position-inc-line).
Note that horizontal tab, vertical tab, and form feed just increment the column number by 1 and leave the line number unchanged, like most other characters. This may not match the visual appearance, but the parser has no way to know how many columns a horizontal tab takes, or how many lines a vertical tab or form feed takes. So, at least for now, we just treat these as most other characters.
If the next byte read has any other value, we deem it illegal, and return an error message with the current file position. We intentionally exclude most ASCII control characters, except for the basic ones and for the new-line ones, since there should be little need to use those in C code. Furthermore, some are dangerous, particularly backspace, since it may make the code look different from what it is, similarly to Trojan Source in Unicode. However, if we encounter practical code that uses some of these ASCII control characters, we can easily add support for them, in the ABNF grammar and in the parser.
Function:
(defun read-char (pstate) (declare (xargs :guard (parstatep pstate))) (let ((__function__ 'read-char)) (declare (ignorable __function__)) (b* (((reterr) nil (irr-position) (irr-parstate)) ((parstate pstate) pstate) ((when (consp pstate.chars-unread)) (b* ((char+pos (car pstate.chars-unread))) (retok (char+position->char char+pos) (char+position->position char+pos) (change-parstate pstate :chars-unread (cdr pstate.chars-unread) :chars-read (cons char+pos pstate.chars-read))))) ((unless (consp pstate.bytes)) (retok nil pstate.position (irr-parstate))) (byte (car pstate.bytes))) (cond ((or (= byte 9) (= byte 11) (= byte 12) (and (<= 32 byte) (<= byte 126))) (retok byte pstate.position (change-parstate pstate :bytes (cdr pstate.bytes) :position (position-inc-column 1 pstate.position) :chars-read (cons (make-char+position :char byte :position pstate.position) pstate.chars-read)))) ((= byte 10) (retok 10 pstate.position (change-parstate pstate :bytes (cdr pstate.bytes) :position (position-inc-line 1 pstate.position) :chars-read (cons (make-char+position :char 10 :position pstate.position) pstate.chars-read)))) ((= byte 13) (if (and (consp (cdr pstate.bytes)) (= (cadr pstate.bytes) 10)) (retok 10 pstate.position (change-parstate pstate :bytes (cddr pstate.bytes) :position (position-inc-line 1 pstate.position) :chars-read (cons (make-char+position :char 10 :position pstate.position) pstate.chars-read))) (retok 10 pstate.position (change-parstate pstate :bytes (cdr pstate.bytes) :position (position-inc-line 1 pstate.position) :chars-read (cons (make-char+position :char 10 :position pstate.position) pstate.chars-read))))) (t (reterr-msg :where (position-to-msg pstate.position) :expected "an ASCII character with code ~ in the range 9-13 or in the range 32-126" :found (char-to-msg byte)))))))
Theorem:
(defthm nat-optionp-of-read-char.char? (b* (((mv acl2::?erp ?char? acl2::?pos ?new-pstate) (read-char pstate))) (nat-optionp char?)) :rule-classes :rewrite)
Theorem:
(defthm positionp-of-read-char.pos (b* (((mv acl2::?erp ?char? acl2::?pos ?new-pstate) (read-char pstate))) (positionp pos)) :rule-classes :rewrite)
Theorem:
(defthm parstatep-of-read-char.new-pstate (b* (((mv acl2::?erp ?char? acl2::?pos ?new-pstate) (read-char pstate))) (parstatep new-pstate)) :rule-classes :rewrite)
Theorem:
(defthm parsize-of-read-char-uncond (b* (((mv acl2::?erp ?char? acl2::?pos ?new-pstate) (read-char pstate))) (<= (parsize new-pstate) (parsize pstate))) :rule-classes :linear)
Theorem:
(defthm parsize-of-read-char-cond (b* (((mv acl2::?erp ?char? acl2::?pos ?new-pstate) (read-char pstate))) (implies (and (not erp) char?) (<= (parsize new-pstate) (1- (parsize pstate))))) :rule-classes :linear)