Expr
Fixtype of expressions [C:6.5] [C:A.2.1].
This is a tagged union type, introduced by fty::deftagsum.
Member Tags → Types
- :ident → expr-ident
- :const → expr-const
- :string → expr-string
- :paren → expr-paren
- :gensel → expr-gensel
- :arrsub → expr-arrsub
- :funcall → expr-funcall
- :member → expr-member
- :memberp → expr-memberp
- :complit → expr-complit
- :unary → expr-unary
- :sizeof → expr-sizeof
- :sizeof-ambig → expr-sizeof-ambig
- :alignof → expr-alignof
- :cast → expr-cast
- :binary → expr-binary
- :cond → expr-cond
- :comma → expr-comma
- :cast/call-ambig → expr-cast/call-ambig
- :cast/mul-ambig → expr-cast/mul-ambig
- :cast/add-ambig → expr-cast/add-ambig
- :cast/sub-ambig → expr-cast/sub-ambig
- :cast/and-ambig → expr-cast/and-ambig
- :stmt → expr-stmt
- :tycompat → expr-tycompat
- :offsetof → expr-offsetof
- :va-arg → expr-va-arg
- :extension → expr-extension
This corresponds to expression in the grammar in [C].
Given that the abstract syntax is tree-structured,
we do not explicitly introduce the various kinds of binary expressions
defined in the grammar in [C],
and instead use a single kind of binary expression
consisting of two sub-expressions and a binary operator.
Furthermore, we always use the general fixtype for expressions
as components of other constructs where the grammar in [C]
uses more specific kinds of expressions,
like assignment-expression in generic-selection.
This means that our fixtypes are a bit more general,
but we can use separate predicates to enforce restrictions.
Identifiers may be accompanied by some additional information,
such as types calculated during validation.
This is an instance of the additional information
discussed in abstract-syntax.
In order to capture
possibly redundant parenthesization from the concrete syntax,
we include, in this fixtype, a case :paren
for an explicitly parenthesized expression.
There is a syntactic overlap between
identifier expressions (variables) and (enumeration) constants.
This ambiguity cannot be resolved purely syntactically.
During parsing, we classify those as identifier expressions,
and defer a possible re-classification to enumeration constant
during a post-parsing static semantic analysis.
Instead of a single string literal, we allow a list of them,
which should be non-empty, although we do not capture this constraint.
This mirrors the ABNF grammar;
we preserve the information about adjacent string literals,
as opposed to concatenating them into one.
The :sizeof case of this fixtype
captures sizeof applied to a type name.
The sizeof applied to an expression is instead captured
in the :unary case,
since unop includes sizeof for expressions.
As explained in amb-expr/tyname,
there is a complex syntactic overlap between expressions and type names;
thus, an expression of the form sizeof(X),
where X is in that syntactic overlap,
is inherently ambiguous.
(The simplest case is when X is an identifier,
but as explained in amb-expr/tyname
there are infinitely many cases.)
This is captured by the :sizeof-ambig case,
which contains an amb-expr/tyname.
The :alignof case of this fixtype
includes an indication of the undescore variant.
Note that the variant without underscores
represent the standard _Alignof,
not the non-existing alignof,
while the other two represent __alignof and __alignof__;
see the ABNF grammar.
Presumable, _Alignof was added to the grammar
after __alignof and __alignof__ were GCC extensions.
We use different cases, :member and :memberp
for the . and -> operators.
For compound literals, we also capture
the presence of absence of the final comma
just after the initializer-list.
We formalize initializer-list [C:6.7.9] [C:A.2.2]
as a list (which should be non-empty, unless GCC extensions are enabled)
of pairs each consisting of
some designators and an initializer (see desiniter.
The comma sequentialization operator is modeled
as its own case in this fixtype.
An alternative is to include that into binop.
Another alternative is to model it as taking
a list of expressions (it associates to the left).
But for now the current model is adequate.
The last five kinds of expressions capture
syntactically ambiguous expressions of the forms
( X ) IncDec ( E ) Pr
( X ) IncDec * E
( X ) IncDec + E
( X ) IncDec - E
( X ) IncDec & E
where X is an ambiguous expression or type name,
IncDec is a sequence of zero or more
increment and decrement operators ++ and --,
E is an expression,
and Pr is a possibly empty rest of a postfix expression.
All of this is now explained in detail.
If X is an ambiguous expression or type name
(i.e. something captured by amb-expr/tyname),
then (X) could either start a proper cast expression
(if X is a type name)
or be or start an expression that is or starts with
a parenthesized primary expression
(if X is an expression).
Note that (X) could also start a compound literal,
but in that case we would be able to disambiguate X
to be a type name and not an expression,
because an expression (X) cannot be followed by an open curly brace.
So if (X) is not followed by an open curly brace,
there are a number of other tokens that may follow:
some would let us disambiguate X
to be either a type name or an expression;
but in other cases it is not possible to disambiguate X,
and so, similarly to :sizeof-ambig,
we capture the ambiguous constructs explicitly in our abstract syntax.
But for these ambiguous casts, the situation is more complex.
In the five patterns shown above,
it is easier to ignore the IncDec part at first,
pretending it is not there.
In the first pattern, the Pr is either empty
or one or more of the constructs that
may be cascaded in postfix expressions.
For instance, Pr could have the form [3].mem(ab)++,
which consists of
an array access [3],
a member access .mem,
a function call (ab) where ab is the argument,
and a postincrement operator ++.
In this situation, if X is an expression
then (X)(E) is a function call that precedes the array access;
but if X is a type name,
then (E) is the start of a postfix expression.
Note that in general an expression E
could be a comma-separated sequence of (assignment) expressions,
but that still looks like a function call,
with multiple arguments:
see the isomorphism between the grammar rules for
argument-expression-list and expression in [C].
The two situations cannot be disambiguated purely syntactically.
So the :cast/call-ambig case of this fixtype
captures this ambiguous situation:
it is either a cast to X or a call of X,
where the X component (an amb-expr/tyname)
is either a type (name) or a function,
and where the (E)Pr component (an expression)
is either the argument of the cast
or the rest of the postfix expression.
Currently this fixtype does not capture the requirement that
the expression has in fact the form (E)Pr,
but this should be an invariant after parsing,
and it can be enforced externally to this fixtype.
The addition of IncDec maintains the ambiguity.
If X is a type name,
the sequence of operators are pre-increment/decrement ones
that are part of the argument of the cast,
and whose final argument is (E)Pr.
If instead X is an expression,
we have a postfix expression starting with X,
continuing with those as post-increment/decrement operators,
and ending with (E)Pr.
So the :cast/call-ambig case of this fixtype
includes, between the components for X and (E)Pr,
a list of zero or more increment and decrement operators.
The other four of the five patterns shown earlier
are more uniform with each other.
Again, ignore IncDec initially.
The issue here is that the operators *, +, -, &
are both unary and binary.
Thus, if X is a type name,
the * E or + E or - E or & E
is a unary expression that is the argument of the cast;
the operator is unary.
If instead X is an expression,
the operator is binary with operands (X) and E.
The cases
:cast/mul-ambig,
:cast/add-ambig,
:cast/sub-ambig, and
:cast/and-ambig of this fixtype
capture these ambiguous situations.
They are either casts or
multiplications/additions/subtractions/conjunctions.
Their first component is X,
and their last component is E.
Their middle component is a list of zero or more
increment and decrement operators that may be in between,
i.e. IncDec in the patterns shown earlier.
Their presence maintain the ambiguity:
if X is a type name,
they are pre-increment and pre-decrement operators
applied to * E or + E or - E or & E;
if X is an expression,
they are post-increment and post-decrement operators applied to X,
forming the left operand of the binary operators.
These should capture all the possible ambiguous cases.
One needs to look, in the grammar, at what can follow the (X),
where X is an ambiguous type name or expression.
The cases explained above lead to unresolvable syntactic ambiguities
(which can be resolved semantically, of course).
Other cases lead to disambiguation.
For instance, if (X) is followed by !,
then X must be a type name,
and the ! must start a unary expression
that is the argument of the cast expression:
the ! cannot follow an expression,
if X were an expression instead.
But it should be apparent that this is all very tricky;
we plan to work on a formal proof showing that
indeed the last five cases of this fixtype
captures all and only the ambiguous expressions
that start with (X)
where X is an ambiguous type name or expression.
Also see how parser handles
possibly ambiguous cast expressions.
As a GCC extension, we allow the omission of
the `then' sub-expression of a conditional expression.
See the ABNF grammar.
As a GCC extension, we include statement expressions,
i.e. expressions consisting of compound statements.
The :stmt case of this fixtype includes
the block items that comprise the compound statement.
As a GCC extension, we include calls of
the built-in function __builtin_types_compatible_p.
This is not a regular function,
because its arguments are types names, not expressions.
As a GCC extension, we include calls of
the built-in function __builtin_offsetof.
This is not a regular function,
because its first argument is a type name, not an expression.
The second argument is a member designator,
which is a restricted form of expression.
As a GCC extension, we include calls of
the built-in function __builtin_va_arg.
This is not a regular function,
because its second argument is a type name, not an expression.
The first argument is an expression.
As a GCC extesntion, we include
expressions preceded by __extension__.
See our ABNF grammar.
Subtopics
- Exprp
- Recognizer for expr structures.
- Expr-case
- Case macro for the different kinds of expr structures.
- Expr-kind
- Get the kind (tag) of a expr structure.
- Expr-binary
- Expr-equiv
- Basic equivalence relation for expr structures.
- Expr-cond
- Expr-complit
- Expr-cast/sub-ambig
- Expr-cast/mul-ambig
- Expr-cast/call-ambig
- Expr-cast/and-ambig
- Expr-cast/add-ambig
- Expr-va-arg
- Expr-unary
- Expr-tycompat
- Expr-offsetof
- Expr-memberp
- Expr-member
- Expr-ident
- Expr-gensel
- Expr-funcall
- Expr-comma
- Expr-cast
- Expr-arrsub
- Expr-alignof
- Expr-string
- Expr-stmt
- Expr-sizeof-ambig
- Expr-sizeof
- Expr-paren
- Expr-extension
- Expr-const
- Expr-fix
- Fixing function for expr structures.
- Expr-count
- Measure for recurring over expr structures.