Search-engine friendly clone of the ACL2 documentation.

Loader

Lexer

A lexer for Verilog and SystemVerilog.

Our lexer is intended to correctly support the full syntax for Verilog-2005 and SystemVerilog-2012. There are some significant differences between these languages, e.g., Verilog has only a subset of SystemVerilog's keywords and operators. You can configure which edition of the standard is being used; see vl-loadconfig-p.

Note: our support for SystemVerilog is preliminary and may be buggy. Our parser does not yet support much of SystemVerilog, and some lexer details may change as we work to extend the parser.

Interface

This lexer is a small part of VL's loader. The input to the lexer should already be preprocessed, i.e., there should not be any grave characters (`) in the input.

Conventional lexers are often implemented as a get-token routine, the idea being that, at the parser's request, the lexer should read just enough from the input stream to provide the next token.

In contrast, our top-level lexer function, vl-lex, processes the whole list of input characters that are produced by the preprocessor, and generates a whole list of tokens as output. (It also generates a list of warnings, e.g., when integer constants are truncated.)

This is obviously inefficient. On the other hand, memory is abundant and lexing is almost intrinsically O(n). Meanwhile, our approach allows our parser to be state-free, with arbitrary lookahead, and also gives us the convenience of list-based (rather than file-based) debugging and unit testing throughout the whole process; see for instance make-test-tokens.

A basic correctness result for our lexer is:

Theorem: vl-tokenlist->etext-of-vl-lex

(defthm vl-tokenlist->etext-of-vl-lex
  (b* (((mv okp tokens ?warnings)
        (vl-lex echars
                :warnings warnings
                :config config)))
    (implies (and okp (force (vl-echarlist-p echars))
                  (force (true-listp echars)))
             (equal (vl-tokenlist->etext tokens)
                    echars))))

That is, whenever the lexer successfully tokenizes its input echars, you could flatten the resulting tokens to recover exactly the input text.

This theorem is not entirely satisfying. It doesn't say anything about whether we've tokenized the input correctly, i.e., in the sense of the Verilog standard. We can't really think of a reasonable way to formalize that. But it does mean that we at least accounted for every character of input, and that's probably worth something.

To make this theorem possible, our lexer produces tokens even for whitespace and comments. These tokens are not suitable as input for the parser, and should be removed using vl-kill-whitespace-and-comments before parsing begins.

Since we often want to use VL to transform or simplify Verilog source code, we don't throw away comments—instead, we collect them up into a vl-commentmap-p. We that later use comment maps to re-attach the comments to the right parts of the tree; see for instance vl-ppc-module.

Subtopics

Lex-strings: Handling of string literals.
Lex-identifiers: Handling of identifiers.
Vl-typo-uppercase-p
Vl-typo-number-p
Vl-typo-lowercase-p
Lex-numbers: Handling of integer numbers, real numbers, and time literals.
Chartypes: Recognizers for basic character types (e.g., whitespace characters, alphabetic characters, etc.), typically introduced with defchar.
Vl-lex: Top level lexing function.
Defchar: Introduce lexing utilites from a character recognizer.
Tokens: Representation of tokens structures that are produced by the lexer.
Lex-keywords: Handling of keywords in the lexer.
Lexstate: Low level, internal configuration for the lexer, which allows us to switch between supported Verilog editions.
Make-test-tokens: A simple way to lex an ACL2 string.
Lexer-utils: Utilities for writing lexer functions and proving theorems about them.
Lex-comments: Support for Verilog comments.
Vl-typo-uppercase-list-p: (vl-typo-uppercase-list-p x) recognizes lists where every element satisfies vl-typo-uppercase-p.
Vl-typo-lowercase-list-p: (vl-typo-lowercase-list-p x) recognizes lists where every element satisfies vl-typo-lowercase-p.
Vl-typo-number-list-p: (vl-typo-number-list-p x) recognizes lists where every element satisfies vl-typo-number-p.