Concrete-syntax
Concrete syntax of Yul.
The concrete syntax is defined by
an ABNF grammar based on the grammar in [Yul].
We parse the ABNF grammar into an ACL2 representation.
A complete specification of the concrete syntax of Yul
would need to complement the grammar with predicates that
define and relate the two levels of lexical and syntactic sub-grammars,
and also restrict certain grammar rules.
We plan to do that in the future.
We also provide a parser of Yul, based on our ABNF grammar.
The parser is currently not verified, but it follows the grammar closely.
The primary API for parsing Yul is
parse-yul and parse-yul-bytes.
There are currently two published grammars of Yul:
one is in [Yul: Specification of Yul];
the other is part of the Solidity grammar in [Solidity: Language Grammar].
The Yul team has told us that the former is older than the latter,
and that the plan is to have a separate Yul grammar
along the lines of the one that is part of the Solidity grammar.
For now, we formalize both grammars in ABNF,
and we parse both grammars into ACL2.
(The reason is somewhat accidental:
we initially formalized and parsed the older grammar,
after which we were told that the other grammar is newer;
but since we have the old one formalized and parsed already,
we like to keep it for now, along with the new one.
The old and new grammar have the following differences:
- The old grammar allows dots in identifiers,
while the new grammar does not.
However, the new grammar introduces a notion of path,
which is a sequence of one or more dot-separated identifiers;
paths can be used as expressions and can be assigned to.
- The old grammar includes type names, defined as identifiers,
while the new grammar does not have that notion.
In particular, no optional types are allowed by the new grammar
for literals, declared variables, and function inputs and outputs.
- The old grammar allows any expression as statement,
while the new grammar allows only function calls.
- The old grammar allows any expression
to initialize multiple variables or to assign to multiple variables,
while the new grammar allows only function calls.
- The old grammar allows leading zeros in decimal numbers,
while the new grammar disallows them.
- The old and new grammar have
somewhat different definitions of string literals.
In particular, the new grammar clarifies the underlying character set,
which was implicit in the old grammar due to the use of
a complement and a wildcard.
The old grammar only allows surrounding double quotes,
while the new grammar also allows surrounding single quotes.
- The new grammar adds hex strings to the possible literals.
The aforementioned parser is based on the new grammar.
Subtopics
- Lexer
- An executable lexer of Yul.
- Parser
- An executable parser of Yul.
- Grammar-new
- New ABNF grammar of Yul.
- Tokenizer
- An executable tokenizer of Yul.
- Grammar-old
- Old ABNF grammar of Yul.