0

I'm working on a expression parser made in Jison, which supports basic things like arithmetics, comparisons etc. I want to allow chained comparisons like 1 < a < 10 and x == y != z. I've already implemented the logic needed to compare multiple values, but I'm strugling with the grammar – Jison keeps grouping the comparisons like (1 < a) < 10 or x == (y != z) and I can't make it recognize the whole thing as one relation.

This is roughly the grammar I have:

expressions = e EOF

e = Number
  | e + e
  | e - e
  | Relation  %prec '=='
  | ...

Relation = e RelationalOperator Relation  %prec 'CHAINED'
  | e RelationalOperator Relation         %prec 'NONCHAINED'

RelationalOperator = '==' | '!=' | ...

(Sorry, I don't know the actual Bison syntax, I use JSON. Here's the entire source.)

The operator precedence is roughly: NONCHAINED, ==, CHAINED, + and -.

I have an action set up on e → Relation, so I need that Relation to match the whole chained comparison, not only a part of it. I tried many things, including tweaking the precedence and changing the right-recursive e RelationalOperator Relation to a left-recursive Relation RelationalOperator e, but nothing worked so far. Either the parser matches only the smallest Relation possible, or it warns me that the grammar is ambiguous.


If you decided to experiment with the program, cloning it and running these commands will get you started:

git checkout develop
yarn
yarn test
m93a
  • 8,866
  • 9
  • 40
  • 58
  • 1
    You have two identical productions with different precedences. That's clearly ambiguous. Also, precedence is immediate; it can't "see through" a non-terminal. So you can't meaningfully assign precedences to `==`, `!=`, etc., because they are all clumped together into a single non-terminal. I don't think precedence is going to work for you with this grammar, anyway. Probably better to just use a cascading grammar. – rici Jun 18 '21 at 02:28
  • Actually, now that I've thought a bit more about it, I think it should be possible to do this with precedence declarations, if you're willing to take a bit of time figuring out how precedence declarations work. There's a comment in the code which points at the bison manual's explanation. There are lots of explanations on SO, too. And elsewhere. However, it will be more difficult to implement implicit multiplication with just precedence. So if that's where you want to go, you need to buckle down and learn the basics of left-to-right parsing. Or of PEG parsing. – rici Jun 19 '21 at 06:23
  • 1
    OK, I provided some suggestions in an answer, so that I can stop thinking about this problem. One of these days I'm going to try to write the definitive "precedence vs unambiguous grammar" answer, but not today. – rici Jun 21 '21 at 01:20

1 Answers1

1

There are basically two relatively easy solutions to this problem:

  1. Use a cascading grammar instead of precedence declarations.

    This makes it relatively easy to write a grammar for chained comparison, and does not really complicate the grammar for binary operators nor for tight-binding unary operators.

    You'll find examples of cascading grammars all over the place, including most programming languages. A reasonably complete example is seen in this grammar for C expressions (just look at the grammar up to constant_expression:).

    One of the advantages of cascading grammars is that they let you group operators at the same precedence level into a single non-terminal, as you try to do with comparison operators and as the linked C grammar does with assignment operators. That doesn't work with precedence declarations because precedence can't "see through" a unit production; the actual token has to be visibly part of the rule with declared precedence.

    Another advantage is that if you have specific parsing needs for chained operators, you can just write the rule for the chained operators accordingly; you don't have to worry about it interfering with the rest of the grammar.

    However, cascading grammars don't really get unary operators right, unless the unary operators are all at the top of the precedence hierarchy. This can be seen in Python, which uses a cascading grammar and has several unary operators low in the precedence hierarchy, such as the not operator, leading to the following oddity:

    >>> if False == not True: print("All is well")
      File "<stdin>", line 1
        if False == not True: print("All is well")
                    ^
    SyntaxError: invalid syntax
    

    That's a syntax error because == has higher precedence than not. The cascading grammar only allows an expression to appear as the operand of an operator with lower precedence than any operator in the expression, which means that the expression not True cannot be the operand of ==. (The precedence ordering allows not a == b to be grouped as not (a == b).) That prohibition is arguably ridiculous, since there is no other possible interpretation of False == not True other than False == (not True), and the fact that the precedence ordering forbids the only possible interpretation makes the only possible interpretation a syntax error. This doesn't happen with precedence declarations, because the precedence declaration is only used if there is more than one possible parse (that is, if there is really an ambiguity).

    Your grammar puts not at the top of the precedence hierarchy, although it should really share that level with unary minus rather than being above unary minus [Note 1]. So that's not an impediment to using a cascading grammar. However, I see that you also want to implement an if … then … else operator, which is syntactically a low-precedence prefix operator. So if you wanted 4 + if x then 0 else 1 to have the value 5 when x is false (rather than being a syntax error), the cascading grammar would be problematic. You might not care about this, and if you don't, that's probably the way to go.

  2. Stick with precedence declarations and handle the chained comparison as an exception in the semantic action.

    This will allow the simplest possible grammar, but it will complicate your actions a bit. To implement it, you'll want to implement the comparison operators as left-associative, and then you'll need to be able to distinguish in the semantic actions between a comparison (which is a list of expressions and comparison operators) from any other expression (which is a string). The semantic action for a comparison operator needs to either extend or create the list, depending on whether the left-hand operand is a list or a string. The semantic action for any other operator (including parenthetic grouping) and for the right-hand operand in a comparison needs to check if it has received a list, and if so compile it into a string.

Whichever of those two options you choose, you'll probably want to fix the various precedence errors in the existing grammar, some of which were already present in your upstream source (like the unary minus / not confusion mentioned above). These include:

  • Exponentiation is configured as left-associative, whereas it is almost universally considered a right-associative operator. Many languages also make it higher precedence than unary minus, as well, since -a2 is pretty well always read as the negative of a squared rather than the square of minus a (which would just be a squared).
  • I suppose you are going to ditch the ternary operator ?: in favour of your if … then … else operator. But if you leave ?: in the grammar, you should make it right associative, as it is in every language other than PHP. (And the associativity in PHP is generally recognised as a design error. See this summary.)
  • The not in operator is actually two token, not and in, and not has quite high precedence. And that's how it will be parsed by your grammar, with the result that 4 + 3 in (7, 8) evaluates to true (because it was grouped as (4 + 3) in (7, 8)), while 4 + 3 not in (7, 8) evaluates rather surprisingly to 5, having been grouped as 4 + (3 not in (7, 8)).

Notes

  1. If you used a cascading precedence grammar, you'd see that only one of - not 0 and not - 0 is parseable. Of course, both are probably type violations, but that's not something the syntax should concern itself with.
rici
  • 234,347
  • 28
  • 237
  • 341
  • Thank you for a very thorough and helpful answer! I am especially thankful that you pointed out the errors in the current grammar – I've implemented the fixes already! However, I'm not really sure about `- not 0` and `not - 0`, which produce `-1` and `1` respectively, which is exactly what I'd expect... (Although for the next version, I've made type coertions more strict, so both will result in a runtime error.) – m93a Jun 23 '21 at 01:35
  • Just to be sure – I've managed to write this grammar: [src](https://github.com/m93a/filtrex/blob/e30a9ab02a5ec187243db071944b2af5b5c22cab/src/generateParser.mjs). While compiling the parser, Jison throws a lot of warnings about “state conflicts” and “multiple actions possible when [...]”, but the compiled parser ends up doing exactly what I asked for in the question. Do I understand it correctly that it does so just because by pure coincidence Jison chose the correct branch, and that there is no way to ensure it will choose it consistently? (I mean: only by specifying precedence?) – m93a Jun 23 '21 at 01:44
  • 1
    @m93a: Sorry, the comment about `not` and `-` wasn't very clear. I meant that those precedences wouldn't work if you used a cascading precedence grammar. They'll work fine with declared precedence. But it's still sloppy. Multiple unary operators of the same type should not be in consecutive precedence levels; they should all be in the same precedence level. It's either meaningless or wrong to put them in different precedence levels. – rici Jun 23 '21 at 01:58
  • 1
    It's possible that the precedence ordering the default rules apply is what you want, but you'll need a lot more unit tests than I saw in your repository to check all the possibilities. (Those unit tests should have found the errors I noted in my answer. Seven years is a long time for a bug to go unnoticed. So it seems clear that more tests are needed in any case.) Jison will use the same default rules, so if they work, they work. Lots of in-production grammars have `%expect n` declarations (which means "I expect n conflicts, so don't warn me unless there is a different number of them.) – rici Jun 23 '21 at 02:05
  • So there is precedent. But many of us don't like grammars with "expected" conflicts, because they are hard to add new features to. – rici Jun 23 '21 at 02:06
  • Awesome, thanks again! I think I'll avoid grammar conflicts and use the option #2 you suggested. – m93a Jun 23 '21 at 02:11