It's not particularly easy to make the concepts of "left-associative" and "right-associative" precise, since they don't directly correspond to any clear grammatical feature. Still, I'll try.
Despite the lack of math layout, I tried to insert an explanation of precedence relations here, and it's the best I can do, so I won't repeat it. The basic idea is that given an operator grammar (i.e., a grammar in which no production has two non-terminals without an intervening terminal), it is possible to define precedence relations ⋖
, ≐
, and ⋗
between grammar symbols, and then this relation can be extended to terminals.
Put simply, if a
and b
are two terminals, a ⋖ b
holds if there is some production in which a
is followed by a non-terminal which has a derivation (possibly not immediate) in which the first terminal is b
. a ⋗ b
holds if there is some production in which b
follows a non-terminal which has a derivation in which the last terminal is a
. And a ≐ b
holds if there is some production in which a
and b
are either consecutive or are separated by a single non-terminal. The use of symbols which look like arithmetic comparisons is unfortunate, because none of the usual arithmetic laws apply. It is not necessary (in fact, it is rare) for a ≐ a
to be true; a ≐ b
does not imply b ≐ a
and it may be the case that both (or neither) of a ⋖ b
and a ⋗ b
are true.
An operator grammar is an operator precedence grammar iff given any two terminals a
and b
, at most one of a ⋖ b
, a ≐ b
and a ⋗ b
hold.
If a grammar is an operator-precedence grammar, it may be possible to find an assignment of integers to terminals which make the precedence relationships more or less correspond to integer comparisons. Precise correspondence is rarely possible, because of the rarity of a ≐ a
. However, it is often possible to find two functions, f(t)
and g(t)
such that a ⋖ b
is true if f(a) < g(b)
and a ⋗ b
is true if f(a) > g(b)
. (We don't worry about only if
, because it may be the case that no relation holds between a
and b
, and often a ≐ b
is handled with a different mechanism: indeed, it means something radically different.)
%left
and %right
(the yacc/bison/lemon/... declarations) construct functions f
and g
. They way they do it is pretty simple. If OP
(an operator) is "left-associative", that means that expr1 OP expr2 OP expr3
must be parsed as <expr1 OP expr2> OP expr3
, in which case OP ⋗ OP
(which you can see from the derivation). Similarly, if ROP
were "right-associative", then expr1 ROP expr2 ROP expr3
must be parsed as expr1 ROP <expr2 ROP expr3>
, in which case ROP ⋖ ROP
.
Since f
and g
are separate functions, this is fine: a left-associative operator will have f(OP) > g(OP)
while a right-associative operator will have f(ROP) < g(ROP)
. This can easily be implemented by using two consecutive integers for each precedence level and assigning them to f
and g
in turn if the operator is right-associative, and to g
and f
in turn if it's left-associative. (This procedure will guarantee that f(T)
is never equal to g(T)
. In the usual expression grammar, the only ≐ relationships are between open and close bracket-type-symbols, and these are not usually ambiguous, so in a yacc-derivative grammar it's not necessary to assign them precedence values at all. In a Floyd parser, they would be marked as ≐
.)
Now, what about prefix and postfix operators? Prefix operators are always found in a production of the form [1]:
non-terminal-1: PREFIX non-terminal-2;
There is no non-terminal preceding PREFIX
so it is not possible for anything to be ⋗ PREFIX
(because the definition of a ⋗ b
requires that there be a non-terminal preceding b
). So if PREFIX
is associative at all, it must be right-associative. Similarly, postfix operators correspond to:
non-terminal-3: non-terminal-4 POSTFIX;
and thus POSTFIX
, if it is associative at all, must be left-associative.
Operators may be either semantically or syntactically non-associative (in the sense that applying the operator to the result of an application of the same operator is undefined or ill-formed). For example, in C++, ++ ++ a
is semantically incorrect (unless operator++()
has been redefined for a
in some way), but it is accepted by the grammar (in case operator++()
has been redefined). On the other hand, new new T
is not syntactically correct. So new
is syntactically non-associative.
[1] In Floyd grammars, all non-terminals are coalesced into a single non-terminal type, usually expression
. However, the definition of precedence-relations doesn't require this, so I've used different place-holders for the different non-terminal types.