*grammar*
The Java grammar, in ABNF.
We put together lexical and syntactic grammar.
The resulting grammar is well-formed and closed, and it only produces terminal strings of Java Unicode characters.
Proving the latter property by execution would be slow
due to the relatively large range of Java Unicode characters;
thus, we disable the executable counterpart of integers-from-to
and use library theorems that relate integers-from-to
to in and list-in.
The goal symbol of the Java syntactic grammar is
CompilationUnit [JLS14:2.3] [JLS14:7.3].
One might expect that the set of all strings derived from this goal symbol
are a superset of all the syntactically valid Java programs
(a superset, because the grammar does not capture all the requirements).
However, that is not quite the case, for the following reasons:
- The syntactic grammar uses tokens as terminals [JLS14:2.3].
No white space and no comments can be derived from CompilationUnit.
The lexical and syntactic grammars must be considered ``separately''
in order to define
(a superset of) the syntactically valid Java programs.
- Considering the lexical grammar as a whole would imply that
terminal symbols like the three forming the keyword for
have to be exactly those ASCII characters.
However, [JLS14:3.2] distinguishes three lexical translation steps,
where the first one turns Unicode escapes into Unicode characters,
which in particular turns Unicode escapes for ASCII characters
into the corresponding ASCII characters.
The OpenJDK 12 Java compiler indeed accepts u0066u006fu0072
as the keyword for.
Thus, the part of the lexical grammar
that corresponds to this first lexical translation step
must be considered separately from the rest.
We use add-const-to-untranslate-preprocess
to keep this constant unexpanded in output.
Definition: *grammar*
(defconst *grammar*
(append *lexical-grammar* *syntactic-grammar*))
Definitions and Theorems
Function: untranslate-preprocess-*grammar*
(defun untranslate-preprocess-*grammar* (acl2::term acl2::wrld)
(if (equal acl2::term (list 'quote *grammar*))
'*grammar*
(untranslate-preprocess-*syntactic-grammar*
acl2::term acl2::wrld)))
Theorem: rulelist-wfp-of-*grammar*
(defthm rulelist-wfp-of-*grammar*
(abnf::rulelist-wfp *grammar*))
Theorem: rulelist-closedp-of-*grammar*
(defthm rulelist-closedp-of-*grammar*
(abnf::rulelist-closedp *grammar*))
Theorem: unicode-only-*grammar*
(defthm unicode-only-*grammar*
(abnf::rulelist-in-termset-p *grammar* (integers-from-to 0 65535)))
Subtopics
- *grammar*-tree-operations
- Tree operations specialized to *grammar*.