Generator of tools to generate parsing functions.
This macro generates macros, specialized to a given ABNF grammar, to generate parsing functions from the rules of the grammar. This is not a full parser generator, but it is a start towards one.
Instead of generating grammar-specific macros from this macro,
an alternative approach is to have generic parsing generation macros
that take the grammar information as an additional input.
However, this would make the macros more verbose,
also because, besides the grammar itself,
there is some additional information that
the macros generated by
This can be regarded as a meta macro, because it is a macro that generates other macro, or as a parser generator generator, in the sense that it generates tools for parser generation (although the tools do not yet form a full parser generator, in the commonly used sense of the word).
Currently the implementation of this macro does not perform very thorough input validation. This will be improved in the future.
Although this tool is fairly preliminary, it has been already useful to generate a large subset of lexing and parsing functions for real (i.e. not toy) programming languages. We plan to extend these tools more and more towards a customizable parser generator.
(defdefparse name :package ... :grammar ... :prefix ... )
Name of the generated parsing generation tools.
It must be a symbol, different from the ones in other
defdefparse calls.This symbol is inserted into the names of the generated macros, which have the form
defparse-name-... . Thus, this name is used to differentiate different sets of macros, typically generated for different grammars.
Package where the generated macros are put.
It must a string that is the name of an existing package.
If
"P" is the package name, the generated macros have names of the formp::defparse-name-... .
Grammar for which parsing functions are generated by the generated
defparse-name-... macros.It must be the name of an existing constant that contains a non-empty ABNF grammar (i.e. a value of type rulelist that is not
nil ).
Prefix of the parsing functions generated by the generated
p::defparse-name-... macros.This is often something like
parse orlex in some package"Q" , so that the generated parsing functions have names of the formq::parse-... orq::lex-... . Note that the package"Q" may differ from"P" .
Macro to define a table that maps groups in the grammar to the corresponding parsing function names.
It is called as
(defparse-name-group-table "<group1>" symbol1 ... "<groupN>" symbolN )where each
<groupI> is a group written in ABNF concrete syntax and eachsymbolI is the name of the function without the prefix specified in the:prefix input. That is, the name of the parsing function for the group described by<groupI> isprefix-symbolI .The
<group1> , ...,<groupN> strings are parsed via the verified parse-group, obtaining alternations that must be all distinct.The parsing functions
prefix-symbol1 , ...,prefix-symbolN do not exist yet whendefparse-name-group-table is called. The call merely defines the names of these functions, which are created later viadefparse-name-group (see below).
Macro to define a table that maps options in the grammar to the corresponding parsing function names.
It is called as
(defparse-name-option-table "<option1>" symbol1 ... "<optionN>" symbolN )where each
<optionI> is an option written in ABNF concrete syntax and eachsymbolI is the name of the function without the prefix specified in the:prefix input. That is, the name of the parsing function for the option described by<optionI> isprefix-symbolI .The
<option1> , ...,<optionN> strings are parsed via the verified parse-option, obtaining alternations that must be all distinct.The parsing functions
prefix-symbol1 , ...,prefix-symbolN do not exist yet whendefparse-name-option-table is called. The call merely defines the names of these functions, which are created later viadefparse-name-option (see below).
Macro to define a table that maps repetitions in the grammar to the corresponding parsing function names.
It is called as
(defparse-name-repetition-table "<repetition>" symbol1 ... "<repetitionN>" symbolN )where each
<repetitionI> is a group written in ABNF concrete syntax and eachsymbolI is the name of the function without the prefix specified in the:prefix input. That is, the name of the parsing function for the repetition described by<repetition> isprefix-symbolI .The
<repetition1> , ...,<repetitionN> strings are parsed via the verified parse-repetition, obtaining repetitions that must be all distinct.The parsing functions
prefix-symbol1 , ...,prefix-symbolN do not exist yet whendefparse-name-repetition-table is called. The call merely defines the names of these functions, which are created later viadefparse-name-*-rulename anddefparse-name-*-group (see below).
Macro to generate a parsing function for a rule name in the grammar.
It is called as
(defparse-name-rulename "<rulename>") :order ...where
<rulename> is a rule name in the grammar and:order , which is optional, specifies a permutation of the list(1 ... N) , whereN is the number of alternatives that defines the rule name in the grammar.The parsing functions for all the rule names, groups, options, and repetitions that occur in the alternatives that define the rule name must have already been generated when this call is made.
If present,
:order specifies the order in which the alternatives that define the rule name is attempted by the generated parsing function for the rule name.The name of the generated parsing function is
prefix-rulename .The generated parsing function looks up the rule name in the grammar, obtaining the alternation that defines the rule name (this includes any incremental rules; see lookup-rulename). The generated function attempts to parse each alternative in order (in the order in which they appear in the grammar, unless
:order is used, see below), backtracking if the parsing of an alternative fails, stopping when either the parsing of an alternative succeeds or there are no more alternatives (in which case parsing fails). The:order input, if present, specifies a reordering of the alternatives: the numbers1 toN denote the alternatives as they appear in the grammar; the supplied permutation provides the reordering. The reordering is useful to enforce extra-grammatical requirements that provide disambiguation (e.g. parse the longest parsable text), or for greater efficiency (e.g. parse more common alternatives first).The generated parsing function attempts to parse each alternative, which is a concatenation of repetitions, by attempting to parse the repetitions in order. Repetitions are often singletons, i.e. they are effectively elements: in this case, the function directly attempts to parse the element. An element may be a rule name, in which case the corresponding parsing function is called, whose name is known because it is derived from the rule name, similarly to the name of the current parsing function. For elements that are groups or options, or for repetitions that are not singletons, the name of the corresponding parsing function must be present in the tables introduced via the macros above.
Macro to generate a parsing function for a repetition of zero or more instances of a rule name.
It is called as
(defparse-name-*-rulename "<rulename>")where
<rulename> is a rule name in the grammar, for which a parsing function must have already been generated when this call is made.The repetition must be in the table generated via
defparse-name-repetition-table (see above). The name of the generated parsing function is the corresponding one in the table.
Macro to generate a parsing function for a group in the grammar.
It is called as
(defparse-name-group "<group>") :order ...where
<group> is a group in the grammar written in ABNF concrete syntax, and:order , which is optional, specifies a permutation of the list(1 ... N) , whereN is the number of alternatives that forms the group.The parsing functions for all the rule names, groups, options, and repetitions that occur in the group must have already been generated when this call is made.
If present,
:order specifies the order in which the alternatives that form the group is attempted by the generated parsing function for the group.The group must be in the table generated via
defparse-name-group-table (see above). The name of the generated parsing function is the corresponding one in the table.The generated parsing function works similarly to the parsing functions for rule names described above. The difference is that the alternation is the one that forms the group, instead of being retrieved from the grammar. The optional
:order input serves to change the order in which the alternatives are attempted to parse.
Macro to generate a parsing function for a repetition of zero or more instances of a group.
It is called as
(defparse-name-*-group "<group>")where
<group> is a group in the grammar written in ABNF concrete syntax, for which a parsing function must have already been generated when this call is made.The repetition must be in the table generated via
defparse-name-repetition-table (see above). The name of the generated parsing function is the corresponding one in the table.
Macro to generate a parsing function for an option in the grammar.
It is called as
(defparse-name-option "<option>") :order ...where
<option> is an option in the grammar written in ABNF concrete syntax, and:order , which is optional, specifies a permutation of the list(1 ... N) , whereN is the number of alternatives that forms the optionThe parsing functions for all the rule names, groups, options, and repetitions that occur in the option must have already been generated when this call is made.
If present,
:order specifies the order in which the alternatives that form the option is attempted by the generated parsing function for the optionThe group must be in the table generated via
defparse-name-option-table (see above). The name of the generated parsing function is the corresponding one in the table.The generated parsing function works similarly to the ones generated for groups. The key difference is that, if no alternative succeeds, the parsing function for an option succeeds, because it just means that the option was absent, while instead the parsing function for a group fails in that case.
Singleton repetitions do not need to be in the table for repetitions. Singleton repetitions are handled as their underlying elements in the generated parsing functions.
The parsing functions must be generated in the usual ACL2 order, i.e. if a parsing function must call another one, the latter must be generated before generating the former. Thus, these macros currently cannot generate mutually recursive parsing functions. This is clearly a severe limitation for typical parsing, but it is still useful for generating lexers, which often have less mutual recursion. Single recursion is supported, in the form or repetitions. We plan to extend these parser generation tools to support mutual recursion.
Not all the parsing functions need to be generated via the macros.
Some may be handwritten, e.g. if they are mutually recursive.
So long as the names of these handwritten functions
have the form
Numeric and character terminal notations are handled automatically by these parser generation tools. That is, code to parse instances of those is automatically generated; the user does not have to generate any parsing functions for those, and does not need to specify any parsing function names for those (no parsing functions are generated for those; the parsing code for those is generated within larger parsing functions).
All three tables, for groups and options and repetitions, must be defined, even if any of them is empty, i.e. if the user does not want or need to generate or use parsing functions for groups or options or repetitions.
The recursive functions generated by these parsing generation tools are proved to terminate, via the length of the input as measure. To support these termination proofs, the tools generate linear rules for each function: one says that the length of the remaining input is always less than or equal to the length of the input (whether the function succeeds or fails); the other one says that the length of the remaining input is strictly less than the length of the input when the function succeeds (i.e. when it does not return an error). Actually, this second rule is generated only for functions that are expected to consume at least some input when they succeed: this is not the case for function that parse options (because the option may be absent), function that parse zero or more repetitions (because there may be zero repetition), and for functions that parse nullable rule names, i.e. rule names whose definition may be satisfied by the empty sequence of natural numbers. For now the tools use a very conservative notion of nullability: they only regard as nullable a rule name whose definition is a repetition of zero or more of something. In any case, if a parsing function is hand-written, it must include (one or two) linear rules as appropriate, so that generated functions that call the hand-written ones can also generate similar rules, and can be proved to terminate if recursive.
The functions generated by these parser generation tools
are defined via define and include
A call of