Parser
A parser of C into our abstract syntax.
We provide a parser to turn C code into
the abstract syntax defined in abstract-syntax.
The parser is based on our C concrete syntax formulation
in concrete-syntax.
In line with the rationale for our abstract syntax,
the parser preserves much of the information from the concrete syntax.
Currently the parser handles all C code constructs after preprocessing;
our parser does not do any preprocessing.
We plan to extend our abstract syntax with some preprocessing constructs,
and accordingly extend our parser to recognize and preserve those.
We may also develop our own C preprocessor in the future.
Parsing C, even after preprocessing, is notoriously complicated.
There are syntactic ambiguities stemming from the fact that
an identifier may be an expression or a type name.
This is often addressed by performing
some static semantic analysis during parsing,
in order to tell apart identifier expressions and identifier type names.
Our parser instead parses the ambiguous constructs
into explicit representations of ambiguous constructs:
see abstract-syntax.
Our approach avoids the static semantic analysis during parsing,
at the cost of more complicated parsing logic,
but we prefer the cleaner separation of concerns.
The current implementation of our parser
does not capture all ambiguous constructs yet.
It is possible that our parser may reject some valid C code.
However, we plan to cover all ambiguous constructs soon.
Our parser uses recursive descent,
both for lexing and for parsing proper.
The parser is closely based on the ABNF grammar in grammar,
which should be consulted alongside the parser code.
Since that grammar is left-recursive,
we perform the usual left recursion elimination.
Although currently lexing should be context-independent
(i.e. it should be possible to lex the code and then parse it),
our parser is written so that lexing is called on the fly.
This makes it possible to accommodate context-dependent lexing,
which may be needed as we add support for some preprocessing constructs.
Our parser uses error-value tuples to handle errors; see that documentation page.
The current parser is amenable to
returning more informative error messages,
but we have already put some effort into doing that.
This parser is currently not verified, for expediency.
We plan to go back and work on verifying, or synthesizing,
components of this parser, and ideally eventually the whole parser.
This work will be based on our ABNF library and tools. Even better, we may investigate generating the parser automatically
from the grammar with suitable additional information;
The aforementioned ABNF library already has some parsing generation tools,
but they are fairly simple and preliminary,
so they would need significant extensions.
The parser is amenable to some optimizations.
For now we have favored simplicity and regularity,
but if performance turns out to be important,
we can optimize the implementation in some respects.
Even better, we could investigate applying optimizing transformations
to the current parser implementation,
or perhaps to a simpler and higher-level implementation;
this could be part of the idea of generating the parser automatically,
mentioned above.
Subtopics
- Parse-exprs/decls
- Parse expressions, declarations, and related entities.
- Parstate
- Fixtype of parser states.
- Parse-stmts/blocks
- Parse statements, blocks, and related entities.
- Check-full-ppnumber
- Check that the numerical constant just read
is a full preprocessing number.
- Read-char
- Read a character.
- Lex-oct-iconst-/-dec-fconst
- Lex-lexeme
- Lex a lexeme.
- Parse-external-declaration
- Parse an external declaration.
- Parse-cast-expression
- Parse a cast expression.
- Lex-isuffix-if-present
- Lex an integer suffix, if present.
- Lex-identifier/keyword
- Lex an identifier or keyword.
- Parse-postfix-expression
- Parse a postfix expression.
- Lex-hex-iconst/fconst
- Lex a hexadecimal integer or floating constant.
- Classify-partys/declor/ambig
- Attempt to classify what comes next in the input
as a parameter type list or a (possibly abstract) declarator.
- Lex-dec-iconst/fconst
- Lex a decimal integer or floating constant.
- Lex-block-comment
- Lex a block comment.
- Lex-escape-sequence
- Lex an escape sequence.
- Parse-declaration-or-statement
- Parse a declaration or a statement.
- Token
- Fixtype of tokens.
- Lex-*-hexadecimal-digit
- Lex zero or more hexadecimal digits, as many as available.
- Read-token
- Read a token.
- Parse-declaration
- Parse a declaration.
- Lex-*-s-char
- Lex zero or more characters and escape sequences
in a string literal.
- Lex-*-c-char
- Lex zero or more characters and escape sequences
in a character constant.
- Parse-specifier/qualifier
- Parse a specifier or qualifier.
- Lex-*-digit
- Lex zero or more (decimal) digits, as many as available.
- Lex-line-comment
- Lex a line comment.
- Char-to-msg
- Represent an optional character as a message.
- Parse-pointer
- Parse a pointer.
- Parse-expression-or-type-name
- Parse an expression or a type name.
- Parse-declaration-specifier
- Parse a declaration specifier.
- Lex-iconst/fconst
- Lex an integer or floating constant.
- Parse-*-attribute-specifier
- Parse zero or more attribute specifiers.
- Parse-declaration-specifiers
- Parse a list of one or more declaration specifiers.
- Parse-external-declaration-list
- Parse a list of one or more external declarations.
- Lex-dec-expo-if-present
- Lex a decimal exponent, if present.
- Parse-expression-rest
- Parse the rest of an expression.
- Parse-asm-name-specifier
- Parse an assembler name specifier.
- Lex-cconst
- Lex a character constant.
- Partys/declor/ambig
- Fixtype of possible classifications of certain constructs.
- Parse-declarator-or-abstract-declarator
- Parse a declarator or an abstract declarator.
- Lex-dec-fconst
- Lex a decimal floating constant.
- Position
- Fixtype of positions.
- Lexeme
- Fixtype of lexemes.
- Lex-stringlit
- Lex a string literal.
- Lex-non-octal-digit
- Lex a non-octal digit.
- Lex-fsuffix-if-present
- Lex a floating suffix, if present.
- Token-postfix-expression-rest-start-p
- Check if an optional token may start
the rest of a postfix expression.
- Parse-attribute-parameters
- Parse attribute parameters.
- Parse-type-qualifier-list
- Parse a list of one or more type qualifiers.
- Parse-translation-unit
- Parse a translation unit.
- Lex-hexadecimal-digit
- Lex a hexadecimal digit.
- Parse-statement
- Parse a statement.
- Parse-postfix-expression-rest
- Parse the rest of a postfix expression.
- Parse-fileset
- Parse a file set.
- Parse-expression
- Parse an expression.
- Make-expr-unary-with-preinc/predec-ops
- Apply to an expression
all the pre-increment and pre-decrement operators in a list.
- Parse-declaration-list
- Parse a list of one or more declarations.
- Parse-attribute-specifier
- Parse an attribute specifier.
- Lex-sign-if-present
- Lex a sign, if present.
- Lex-dec-expo
- Lex a decimal exponent.
- Lex-bin-expo
- Lex a binary exponent.
- Token-option
- Fixtype of optional tokens.
- Parse-?-asm-name-specifier
- Parse an optional assembler name specifier.
- Parse-*-stringlit
- Parse a list of zero or more string literals.
- Parse-init-declarator-list
- Parse a list of one or more initializer declarators.
- Lexeme-option
- Fixtype of optional lexemes.
- Parse-initializer-list
- Parse a list of one or more initializers.
- Parse-init-declarator
- Parse an initializer declarator.
- Token-unary-expression-start-p
- Check if an optional token may start a unary expression.
- Span
- Fixtype of spans.
- Parse-array/function-declarator
- Parse an array or function declarator.
- Parse-file
- Parse (the data bytes of) a file.
- Parse-attribute-list
- Parse a list of one or more attributes, separated by commas.
- Read-stringlit
- Read a string literal token.
- Parse-attribute
- Parse an attribute.
- Make-expr-cast/add-or-cast/sub-ambig
- Create an ambiguous cast expression based on
a token that is an additive operator.
- Lex-hex-quad
- Lex a quadruple of hexadecimal digits.
- Token-type-qualifier-p
- Check if an optional token is a type qualifier.
- Token-function-specifier-p
- Check if an optional token is a function specifier.
- Read-identifier
- Read an identifier token.
- Parse-assignment-expression
- Parse an assignment expression.
- Parse-argument-expressions
- Parse zero or more argument expressions.
- Unread-token
- Unread a token.
- Token-struct-declaration-start-p
- Check if an optional token may start a structure declaration.
- Reterr-msg
- Return an error consisting of a message
with information about what was expected and what was found where.
- Read-punctuator
- Read a specific punctuator token.
- Char+position
- Fixtype of pairs each consisting of a character and a position.
- Unread-char
- Unread a character.
- Token+span
- Fixtype of pairs each consisting of a token and a span.
- Token-expression-start-p
- Check if an optional token may start an expression.
- Read-keyword
- Read a specific keyword token.
- Parse-*-increment/decrement
- Parse zero or more increment and decrement operators.
- Parse-primary-expression
- Parse a primary expression.
- Parse-direct-abstract-declarator
- Parse a direct abstract declarator.
- Token-to-msg
- Represent a token as a message.
- Backtrack-checkpoint
- Backtrack to the latest checkpoint.
- Token-specifier/qualifier-start-p
- Check if an optional token may start a specifier or qualifier.
- Parse-parameter-declaration
- Parse a parameter declaration.
- Token-primary-expression-start-p
- Check if an optional token may start a primary expression.
- Parse-argument-expressions-rest
- Parse the rest of one or more argument expressions.
- Parse-unary-expression
- Parse a unary expression.
- Parse-generic-associations-rest
- Parse zero or more reamaining generic associations.
- Clear-checkpoint
- Clear the latest checkpoint.
- Parsize
- Size of the parsing state.
- Parse-direct-abstract-declarator-rest
- Parse the rest of a direct abstract declartor.
- Parse-designator-list
- Parse a designator list.
- Init-parstate
- Initial parser state.
- Token-type-specifier-keyword-p
- Check if an optional token is a type specifier
that consists of a single keyword.
- Token-to-type-specifier-keyword
- Map a token that is a type specifier consisting of a single keyword
to the corresponding type specifier.
- Token-designation?-initializer-start-p
- Check if an optional token may start
an initializer optionally preceded by a designation.
- Token-abstract-declarator-start-p
- Check if an optional token may start an abstract declarator.
- Token-unary-operator-p
- Check if an optional token is a unary operator.
- Token-type-specifier-start-p
- Check if an optional token may start a type specifier.
- Token-type-name-start-p
- Check if an optional token may start a type name.
- Token-to-unary-operator
- Map a token that is a unary operator
to the corresponding unary operator.
- Token-to-assignment-operator
- Map a token that is an assignment operator
to the corresponding assignment operator.
- Record-checkpoint
- Record a checkpoint for possible backtracking.
- Parse-specifier-qualifier-list
- Parse a list of one or more specifiers and qualifiers.
- Parse-parameter-declaration-list
- Parse a list of one or more parameter declarations.
- Parse-fileset-loop
- Parse-constant-expression
- Parse a constant expression.
- Parse-conditional-expression
- Parse a conditional expression.
- Unread-tokens
- Unread a specified number of tokens.
- Unread-chars
- Unread a specified number of characters.
- Token-to-storage-class-specifier
- Map a token that is a storage class specifier
to the correspoding storage class specifier.
- Token-storage-class-specifier-p
- Check if an optional token is a storage class specifier.
- Token-direct-abstract-declarator-start-p
- Check if an optional token may start a direct abstract declarator.
- Token-declarator-start-p
- Check if an optional token may start a declarator.
- Token-declaration-specifier-start-p
- Check if an optional token may start a declaration specifier.
- Parse-struct-declaration
- Parse a structure declaration.
- Parse-static-assert-declaration
- Parse a static assert declaration.
- Parse-direct-declarator
- Parse a direct declarator.
- Token-struct-declarator-start-p
- Check if an optional token may start a structure declarator.
- Token-initializer-start-p
- Check if an optional token may start an initializer.
- Token-direct-declarator-start-p
- Check if an optional token may start a direct declarator.
- Token-assignment-operator-p
- Check if an optional token is an assignment operator.
- Parse-direct-declarator-rest
- Parse the rest of a direct declarator.
- Token-to-type-qualifier
- Map a token that is a type qualifier
to the correspoding type qualifier.
- Token-to-function-specifier
- Map a token that is a function specifier
to the corresponding function specifier.
- Token-preinc/predec-operator-p
- Check if an optional token is
a preincrement or predecrement operator.
- Token-multiplicative-operator-p
- Check if an optional token is a multiplicative operator.
- Token-designator-start-p
- Check if an optional token may start a designator.
- Token-designation-start-p
- Check if an optional token may start a designation.
- Parse-struct-or-union-specifier
- Parse or structure or union specifier.
- Parse-enumerator-list
- Parse a list of one or more enumerators.
- Parse-designation?-initializer
- Parse an initializer with an optional designation.
- Parse-block-item
- Parse a block item.
- Token-to-relational-operator
- Map a token that is a relational operator
to the corresponding relational operator.
- Token-to-preinc/predec-operator
- Map a token that is a preincrement or predecrement operator
to the corresponding preincrement or predecrement operator.
- Token-to-multiplicative-operator
- Map a token that is a multiplicative operator
to the corresponding additive operator.
- Token-relational-operator-p
- Check if an optional token is a relational operator.
- Token-equality-operator-p
- Check if an optional token is an equality operator.
- Token-additive-operator-p
- Check if an optional token is an additive operator.
- Span-join
- Join two spans.
- Parse-initializer
- Parse an initializer.
- Parse-generic-association
- Parse a generic association.
- Token-to-shift-operator
- Map a token that is a shift operator
to the corresponding shift operator.
- Token-to-equality-operator
- Map a token that is an equality operator
to the corresponding equality operator.
- Token-to-additive-operator
- Map a token that is an additive operator
to the corresponding additive operator.
- Token-shift-operator-p
- Check if an optional token is a shift operator.
- Position-inc-line
- Increment a position by a number of lines.
- Position-inc-column
- Increment a position by a number of columns.
- Parse-compound-literal
- Parse a compound literal.
- Parse-array/function-abstract-declarator
- Parse an array or function abstract declarator.
- Parse-type-name
- Parse a type name.
- Parse-struct-declarator-list
- Parse a list of one or more structure declarator.
- Parse-struct-declaration-list
- Parse a list of one or more structure declarations.
- Parse-shift-expression-rest
- Parse the rest of a shift expression.
- Parse-relational-expression-rest
- Parse the rest of a relational expression.
- Parse-multiplicative-expression-rest
- Parse the rest of a multiplicative expression.
- Parse-logical-or-expression-rest
- Parse the rest of a logical disjunction expression.
- Parse-logical-and-expression-rest
- Parse the rest of a logical conjunction expression.
- Parse-inclusive-or-expression-rest
- Parse the rest of an inclusive disjunction expression.
- Parse-exclusive-or-expression-rest
- Parse the rest of an exclusive disjunction expression.
- Parse-equality-expression-rest
- Parse the rest of an equality expression.
- Parse-and-expression-rest
- Parse the rest of a conjunction expression.
- Parse-additive-expression-rest
- Parse the rest of an additive expression.
- Irr-partys/declor/ambig
- An irrelevant possible classifications of certain constructs.
- Position-to-msg
- Represent a position as a message.
- Parse-struct-declarator
- Parse a structure declarator.
- Parse-logical-or-expression
- Parse a logical disjunction expression.
- Parse-logical-and-expression
- Parse a logical conjunction expression.
- Parse-inclusive-or-expression
- Parse an inclusive disjunction expression.
- Parse-exclusive-or-expression
- Parse an exclusive disjunction expression.
- Parse-block-item-list
- Parse a list of one or more block items.
- Parse-alignment-specifier
- Parse an alignment specifier.
- Parse-abstract-declarator
- Parse an abstract declarator.
- Char+position-list
- Fixtype of lists of
pairs each consisting of a character and a position.
- Token+span-list
- Fixtype of lists of pairs each consisting of a token and a span.
- Span-to-msg
- Represent a span as a message.
- Parse-shift-expression
- Parse a shift expression.
- Parse-relational-expression
- Parse a relational expression.
- Parse-multiplicative-expression
- Parse a multiplicative expression.
- Parse-equality-expression
- Parse an equality expression.
- Parse-enum-specifier
- Parse an enumeration specifier.
- Parse-and-expression
- Parse a conjunction expression.
- Parse-additive-expression
- Parse an additive expression.
- Position-init
- Initial position in a file.
- Irr-token
- An irrelevant token.
- Irr-span
- An irrelevant span.
- Irr-position
- An irrelevant position.
- Irr-parstate
- An irrelevant parser state.
- Irr-lexeme
- An irrelevant lexeme.
- Token-list
- Fixtype of lists of tokens.
- Parse-designator
- Parse a designator.
- Parse-declarator
- Parse a declarator.
- Parse-enumerator
- Parse an enumerator.