How should a unary operator be applied in an expression grammar?

Question

I have a grammar that I've implemented in C#. My question relates to evaluating expressions. A cutdown version of the grammar just relating to expressions is. Note that the grammar also supports logical operations (using AND and OR, rather than && and ||):

Expression => [ NegOp ] <Term> { <AddOp> <Term> }
Term =>       <Power> { <MulOp> <Power> }
Power =>      <Factor> [ ^ <Factor> ]
Factor =>     <Numeric> | ( OPENPAREN <Expression> CLOSEPAREN )
Numeric =>    <Float> | <Integer>
NegOp =>      + | - | NOT
AddOp =>      + | - | OR
MulOp =>      * | / | % | AND

I've implemented Left to Right evaluation of expressions, so that, for instance:

5 / 6 / 7

is interpreted as:

(5 / 6) / 7

However, I'm unclear how to apply the NegOp to the Expression. If I have the Expression:

5 - 4 * 3    => -7

and I prefix it with a minus sign, thus:

-5 - 4 * 3

should this be interpreted as:

-(5 - 4 * 3)      => 7

or:

(-5) - 4 * 3      => -17

i.e. Should the NegOp bind tightly to the first Term in the Expression, or apply to the Expression as a whole after evaluation?

"Should the NegOp bind tightly to the first Term in the Expression, or apply to the Expression as a whole after evaluation?" Well, which behaviour do _you_ want? — Sweeper, May 11 '21 at 07:59
Hi @Sweeper, I want the behaviour that is generally accepted as the norm in computer languages. I don't know if there is a norm or this varies between languages. I was hoping for enlightenment from "one who knows" the right way. — Mark Roworth, May 11 '21 at 08:10
According to your grammar `-5 - 4 * 3` would apply the unary minus to the whole expression and `5 * -4` wouldn't be legal at all without adding parentheses. That is definitely not the norm though. — sepp2k, May 11 '21 at 08:18
Ah, I see what you mean. Would you shift the NegOp into the Factor statement, thus: ```Factor => ( [ ] ) | ( [ ] OPENPAREN CLOSEPAREN )```, and remove it from the Expression statement? — Mark Roworth, May 11 '21 at 08:38
The mathematically correct form would be the most intuitive. If you assign that term to a variable like ```int i=-5-4*3;``` you get -17 for i in c#, java and c++... — FrankM, May 11 '21 at 08:40
Thanks @FrankM, @sepp2k, your comments make cohesive sense to shifting the NegOp into the Factor rule. In the example above, the '-' works like a binary operator if you insert the implied zero, thus: ```int i=0-5-4*3```. Then it is simply a case of left-to-right evaluation. — Mark Roworth, May 11 '21 at 08:48

rici · Answer 1 · 2021-05-12T06:06:56.267

Normally, unary prefix operators are located almost at the top of the operator precedence chain, just below postfix operators. One way to do that would be to add them to what you call Factor:

Factor =>     <Numeric> | OPENPAREN <Expression> CLOSEPAREN | NegOp Factor

That could be written as { NegOp } ( <Numeric> | OPENPAREN <Expression> CLOSEPAREN ) but I feel that using "a repetition of NegOp" inappropriately separates the operator(s) from the operand. Also, some people might be inclined to use a single optional [ NegOp ] instead of arbitrary repetition, but that's seems hard to justify. Is there any good reason to disallow the use of two different unary operators on the same operand? But those are decisions you'll have to make.

However, in the specific case of unary minus in combination with exponentiation, it is moderately common but not universal to give exponentiation priority, since you would practically never want -2^20 to mean (-2)^20, which is mathematically the same as 2^20. I'd be inclined to go with the majority view on this, but both decisions are legitimate.

Exponentation itself is usually right-associative, unlike other mathematical operators, again because the other grouping doesn't add anything useful. (a^b)^c is exactly the same as a^(b*c), and would usually be better computed as the second; the expression you normally want is a^(b^c). I don't know of any commonly-used language with exponentiation in which exponentation groups to the left. Your decision to make exponentiation non-grouping also seems to me eccentric. But it's your language.

It's maybe also worth adding that not all prefix operators have naturally high precedence. But it depends on what you mean by "operator".

For example, most languages have constructs like

return x + y
assert i < n

These might look more like "statements" than "prefix operators", but syntactically the difference is minor. It's true that they don't return values, but there are variants which do, such as Python's yield operator.

Also, some languages have constructs like

lambda t: 42 * t         # There are many ways to spell lambda
let t = f(a) in t + 6

These are both bracketed prefix operators (in the same way that the subscript index operator v[i] is a bracketed postfix operator or that the C family's "ternary operator" is really a bracketed infix operator with low precedence. The bracketed part does contain an operand, but it's effectively fully parenthesized so you don't need to worry about it during the parse. So again they are syntactically the same as a prefix operator.

All of these operators are at (or close to) the bottom of the precedence hierarchy; they basically take "the rest of the subexpression" to be their argument. (In other words, the argument continues until terminated with a close parenthesis or whatever passes for a statement delimiter in the language. So if you have those things, your precedence hierarchy will have prefix operators close to both ends of the precedence list. But that's fine. The parse still works.

How should a unary operator be applied in an expression grammar?

1 Answers1