A lexer for Verilog and SystemVerilog.
Our lexer is intended to correctly support the full syntax for Verilog-2005 and SystemVerilog-2012. There are some significant differences between these languages, e.g., Verilog has only a subset of SystemVerilog's keywords and operators. You can configure which edition of the standard is being used; see vl-loadconfig-p.
Note: our support for SystemVerilog is preliminary and may be buggy. Our parser does not yet support much of SystemVerilog, and some lexer details may change as we work to extend the parser.
This lexer is a small part of VL's loader. The input to the lexer
should already be
Conventional lexers are often implemented as a
In contrast, our top-level lexer function, vl-lex, processes the whole list of input characters that are produced by the preprocessor, and generates a whole list of tokens as output. (It also generates a list of warnings, e.g., when integer constants are truncated.)
This is obviously inefficient. On the other hand, memory is abundant and
lexing is almost intrinsically
A basic correctness result for our lexer is:
Theorem:
(defthm vl-tokenlist->etext-of-vl-lex (b* (((mv okp tokens ?warnings) (vl-lex echars :warnings warnings :config config))) (implies (and okp (force (vl-echarlist-p echars)) (force (true-listp echars))) (equal (vl-tokenlist->etext tokens) echars))))
That is, whenever the lexer successfully tokenizes its input
This theorem is not entirely satisfying. It doesn't say anything about whether we've tokenized the input correctly, i.e., in the sense of the Verilog standard. We can't really think of a reasonable way to formalize that. But it does mean that we at least accounted for every character of input, and that's probably worth something.
To make this theorem possible, our lexer produces tokens even for whitespace and comments. These tokens are not suitable as input for the parser, and should be removed using vl-kill-whitespace-and-comments before parsing begins.
Since we often want to use VL to transform or simplify Verilog source code, we don't throw away comments—instead, we collect them up into a vl-commentmap-p. We that later use comment maps to re-attach the comments to the right parts of the tree; see for instance vl-ppc-module.