Parse-cast-expression
Parse a cast expression.
- Signature
(parse-cast-expression parstate)
→
(mv erp expr span new-parstate)
- Arguments
- parstate — Guard (parstatep parstate).
- Returns
- expr — Type (exprp expr).
- span — Type (spanp span).
- new-parstate — Type (parstatep new-parstate), given (parstatep parstate).
We read a token, and there are two cases:
- If the token is an open parenthesis,
we may have either a cast expression proper or a unary expression,
and we may need to deal with the ambiguity discussed in expr.
We describe how we handle all of this after describing the other case,
which is simpler.
- If the token is not an open parenthesis
(including the case there there is no token, at the end of file),
then we must have a unary expression if we have anything,
and we call a separate function to parse that.
Note that if that function fails to find
a valid initial token for a unary expression,
the error message mentions an open parenthesis
among the expected tokens,
because a primary expression (which is a unary expression in grammar)
may start with an open parenthesis;
this covers also the possible open parenthesis
of a cast expression proper,
and so the error message is adequate to
not only expecting (and failing to find) a unary expression,
but also a cast expression.
Now we describe the more complex first case above,
the one when the first token is an open parenthesis.
This may start a cast expression proper or a unary expression,
more precisely a compound literal (a kind of postfix expression),
or a parenthesized expression (a kind of primary expression).
So we must read a second token, and there are four cases:
- If the second token is an identifier,
things are still ambiguous.
The identifier may be an expression or a type name.
We describe this case in more detail below,
after describing the other three cases, which are simpler.
- If the second token may start an expression but is not an identifier,
then we have resolved the ambiguity:
we must be parsing a unary expression,
more precisely a parenthesized expression.
So we put back the second token,
we parse the expression,
and we parse the closed parenthesis.
- If the second token may start a type name but is not an identifier,
things are still ambiguous.
The parenthesized type name may be part of a cast expression proper,
or part of a compund literal.
To resolve this ambiguity,
we parse the type name,
we parse the closed parenthesis,
and then we parse a third token
(after the type name and the closed parenthesis).
If this third token is an open curly brace,
we must be parsing a compound literal:
so we call a separate function to parse (the rest of) it.
If instead this third other token is not a curly brace,
we must be parsing a cast expression proper:
we put back the token,
and we recursively parse a cast expression.
If this third token is absent, it is an error:
the message describes the possible starts of
cast expressions (same as unary expressions),
and open curly braces compound literals.
- If the second token is none of the above, it is an error.
The message mentions all possible starts of expressions and type names:
since we have already parsed the open parenthesis,
those are all the possibilities.
Note that identifiers are the only overlap between
starts of expressions and starts of type names.
Now we describe the more complex first case above,
which happens when there is an identifier after the open parenthesis.
We read a third token, and there are different cases based on that:
- If this third token may start the rest of a postfix expression
(according to token-postfix-expression-rest-start-p),
then we have resolved the ambiguity:
we must be parsing a unary expression,
more precisely a parenthesized postfix expression.
We put back the third token,
we put back the identifier,
we parse the postfix expression,
and we parse the closing parenthesis.
- If this third token is a closing parenthesis,
things are still ambiguous.
We describe this case below,
after describing the next case, which is simpler.
- If this third token is anything else, or is absent (end of file),
it is an error.
The message mentions all the possible expected tokens there.
Now we describe the more complex second case above,
when we have a parenthesized identifier.
We need to read a fourth token:
- If this fourth token is an open curly brace,
we have resolved the ambiguity.
We must be reading a compound literal
that starts with a parenthesized identifier type name.
We put back the token,
and we call a separate ACL2 function
to finish parsing this compound literal.
- If this fourth token is a star,
that star may be either a unary operator,
in which case we must have been parsing a cast expression proper
where the identifier is a type name,
or a binary operator,
in which case we must have been parsing a multiplicative expression
where the identifier is an expression.
Either way, what follows must be a cast expression (proper or not):
see the grammar for cast and unary expressions.
If we can parse such a cast expression,
we still have a syntactic ambiguity,
which we capture in our abstract syntax,
deferring the disambiguation to post-parsing analysis;
see the discussion in expr.
- If this fourth token is a plus or minus,
it may be a unary or binary operator, similarly to the star case.
However, if it is a binary operator,
then the next expression to parser after that
is a multiplicative expression, not a cast expression.
So we parse a multiplicative expression,
and we return the appropriate syntactically ambiguous expression,
according to our abstract syntax (see expr).
- If this fourth token is an ampersand,
the handling is similar to the above cases,
but the next expression to parse is an equality one:
see the grammar rule for conjunction expressions.
- If this fourth token is none of those unary/binary operators,
but it may be the start of a (cast) expression,
then we resolve the ambiguity.
The identifier must be a type name,
and we must have been parsing a cast expression proper.
We put back the token,
and we recursively parse a cast expression.
- If none of the above cases applies,
including the case that the token is absent,
we have resolved the ambiguity.
The identifier must have been an expression,
in parenthesis.
We put back the token (if present),
and we return the parenthesized expression.