0

I know that exponentiation has higher precedence that the unary minus. However if I build an expression parser based on that I still can’t parse expressions like 2—-3. In order to deal with these I’ve found I also need to add unary minus handling to the factor production rule that is one precedence higher than exponentiation. Is this how the unary minus and exponetiation is usually dealt with? I’ve not found anything online or in books that talks about this particular situation. I was wondering whether making exponentiation and unary operators having equal precedence you help?

I'm hand crafting a recursive descent parser, I tried merging the power and unary production rules together but it didn't seem to work. What does work is the following EBNF

factor        = '(' expression ')' | variable | number | '-' factor
power         = factor { '^' factor } 
unaryTerm     = ['-' | '+'] power
term          = unaryTerm { factorOp unaryTerm }
expression    = term { termOp term }

 termOp        = '+' | '-'
 factorOp      = '*' | '/'
rhody
  • 2,274
  • 2
  • 22
  • 40

1 Answers1

2

Unless you have unusual requirements, putting both unary minus and exponentiation in the same non-terminal will work fine, because exponentiation is right-associative: (Yacc/bison syntax)

atom: ID
    | '(' expr ')'
factor
    : atom
    | '-' factor
    | atom '^' factor
term: factor
    | term '*' factor
expr: term
    | expr '+' term
    | expr '-' term

Indeed, exponentiation being right-associative is virtually required for this syntax to be meaningful. Consider the alternative, with a left-associative operator.

Let's say we have two operators, ⊕ and ≀, with ⊕ being left associative and binding more tightly than ≀, so that ≀ a ⊕ b is ≀(a ⊕ b).

Since ⊕ is left associative, we would expect a ⊕ b ⊕ c to be parsed as (a ⊕ b) ⊕ c. But then we get an oddity. Is a ⊕ ≀ b ⊕ c the same as (a ⊕ ≀b) ⊕ c) or the same as a ⊕ ≀(b ⊕ c))? Both options seem to violate the simple patterns. [Note 1]

Certainly, an unambiguous grammar could be written for each case, but which one would be less surprising to a programmer who was just going by the precedence chart? The most likely result would be a style requirement that ≀ expressions always be fully parenthesized, even if the parentheses are redundant. (C style guides are full of such recommendations, and many compilers will chide you for using correct but "unintuitive" expressions.)


Notes:

  1. If you use precedence declarations, you'll get a ⊕ ≀(b ⊕ c)), which might or might not be intuitive, depending on your intuitions.
rici
  • 234,347
  • 28
  • 237
  • 341
  • Your yacc description is interesting, you're treating unary and power at the same level, I assume that's what it implies? It looks like it will it deal with the situation -2^4 which I believe should yield -16, rather than say +16? – rhody Dec 04 '18 at 17:24
  • @rhody: yes, that's what I said and I think it's also what you asked in your last sentence. The entire point of making exponentiation bind more tightly is to make `-2^4` be `-16`. – rici Dec 04 '18 at 17:45
  • I'm hand crafting a recursive descent parser, I tried merging the power and unary production rules together but it didn't seem to work. What does work is the following EBNF (Can't render the ebnf so lines are separated by semicolons): factor = '(' expression ')' | variable | number | '-' factor; power = factor { '^' factor } ; unaryTerm = ['-' | '+'] power; term = unaryTerm { factorOp unaryTerm }; expression = term { termOp term }; termOp = '+' | '-'; factorOp = '*' | '/'; – rhody Dec 04 '18 at 18:35
  • @rhody: I think that will produce an incorrect result for `2^-2^4`, which should work out to `1/(2^16)` rather than 65536. You could try `power = ['+' | '-'] factor { '^' ['+' | '-'] factor }` (eliminating both `unaryTerm` and the redundant unary operators in `factor`.) Although I think I'd go with `power = ['+' | '-'] factor [ '^' power]`. (Replacing the loop with a recursion there was deliberate.) – rici Dec 04 '18 at 19:17
  • I'll give it a go, I appreciate your help here. – rhody Dec 04 '18 at 19:33
  • You're right 2^-2^4 gives the wrong result, It's computing (2^-2)^4 instead. – rhody Dec 04 '18 at 19:51
  • @rhody: in that case, your associativity may also be incorrect. Check that `2^2^4` gives you 65536, not 256. This is why I suggested recursion rather than a loop. – rici Dec 04 '18 at 21:49
  • Looks like using power = ['+' | '-'] factor [ '^' power] solved it, I'll do more extensive testing in the meantime. I much appreciate your help. – rhody Dec 05 '18 at 00:59
  • @rhody: it would make more sense to interchange the names `power` and `factor` :-) But names are arbitrary. – rici Dec 05 '18 at 01:40
  • Just wanted to add one last thing for those who might be interested, I realized I could deal with the odd constructions such as 2---3^---4 etc by making the unary minus repeatable, ie power = { '+' | '-' } factor [ '^' power] – rhody Dec 05 '18 at 20:14
  • @rhody: Good point. Eventually I'll get around to adding the recursive descent EBNF to the answer. But I'm more of an LR(1) fan. – rici Dec 05 '18 at 20:51
  • I've used Yacc a lot but like the control recursive decent gives. – rhody Dec 05 '18 at 22:40
  • @rhody: everyone has their own preferences. Otherwise the world would be boring. (Personally I like not having to force grammars into LL(1). And I prefer to just build ASTs in the parser.) – rici Dec 05 '18 at 22:48
  • Actually one of my next tasks is to build the AST. – rhody Dec 06 '18 at 00:43
  • @rici's original answer worked perfectly for me in my hand-written recursive-descent parser, passes all tests including -2^4 == -16, 2^2^4 == 65536, and 2^-2^4 == 1/65536. Just thought I'd comment in case someone thought that the discussion above implied that rici's answer was incorrect; it is correct, as far as I can tell. – bhaller Jul 22 '20 at 19:20