1

can anyone tell me what's the difference between the following two rules (Notice the order)?

  1. the first which doesn't work

    without => "[" "]"  without | "[" "]"
    with => "[" INDEX "]"  with | "[" INDEX "]"
    array => ID with | ID without | ID with without
    
  2. the second which seemingly works

    without => without  "[" "]"| "[" "]"
    with => with "[" INDEX "]"  | "[" INDEX "]"
    array => ID with | ID without | ID with without
    

i am trying to achieve the syntax of an n-dims array with a size, like C# arrays. So the following syntax should work arr[], arr[1], arr[1][], arr[1][1], arr[][] but not the ones like arr[][1].

rici
  • 234,347
  • 28
  • 237
  • 341
  • The first set of rules are right-recursive; Bison prefers left-recursive rules, though it works with right-recursive ones. Generally, aim to use left-recursive rules with Bison. – Jonathan Leffler Jan 07 '19 at 18:43

1 Answers1

2

I'm assuming that by "doesn't work", you mean that bison reports a shift/reduce conflict. If you go ahead and use the generated parser anyway, then it will not parse correctly in many cases, because the conflict is real and cannot be resolved by any static rule.

The issue is simple. Remember that a LALR(1) bottom-up parser like the one generate by bison performs every reduction exactly at the end of the right-hand side, taking into account only the next token (the "lookahead token"). So it must know which production to use at the moment the production is completely read. (That gives it a lot more latitude than a top-down parser, which needs to know which production it will use at the beginning of the production. But it's still not always enough.)

The problematic case is the production ID with without. Here, whatever input matches with needs to be reduced to a single non-terminal with before the continues with without. To get to this point, the parser must have passed over some number of '[' INDEX ']' dimensions, and the lookahead token must be [, regardless of whether the next dimension has a definite size or not.

If the with rule is right-recursive:

with: '[' INDEX ']' with
    | '[' INDEX ']'

then the parser is really stuck. If what follows has a definite dimension, it needs to continue trying the first production, which means shifting the [. If what follows has no INDEX, it needs to reduce the second production, which will trigger a chain of reductions leading back to the beginning of the list of dimensions.

On the other hand, with a left recursive rule:

with: with '[' INDEX ']'
    | '[' INDEX ']'

the parser has no problem at all, because each with is reduced as soon as the ] is seen. That means that the parser doesn't have to know what follows in order to decide to reduce. It decides between the two rules based on the past, not the future: the first dimension in the array uses the second production, and the remaining ones (which follow a with) use the first one.

That's not to say that left-recursion is always the answer, although it often is. As can be seen in this case, right-recursion of a list means that individual list elements pile up on the parser stack until the list is eventually terminated, while left-recursion allows the reductions to happen immediately, so that the parser stack doesn't need to grow. So if you have a choice, you should generally prefer left-recursion.

But sometimes right-recursion can be convenient, particularly in syntaxes like this where the end of the list is different from the beginning. Another way of writing the grammar could be:

array  : ID dims
dims   : without
       | '[' INDEX ']'
       | '[' INDEX ']' dims
without: '[' ']'
       | '[' ']' without

Here, the grammar only accepts empty dimensions at the end of the list because of the structure of dims. But to achieve that effect, dims must be right-recursive, since it is the end of the list which has the expanded syntax.

rici
  • 234,347
  • 28
  • 237
  • 341
  • thank you rici for this explanation and the extra solution, Yes there was a warning shift/reduce to which i didn't pay attention. To be sure, i tried to replace for empty dims arrays the ``[`` with ``(`` just to overcome the disputed shift and it works (i know, that's a completely different thing ;) ) . Now, because i dont know an other parser with LR(0) or may be LR(2) for our case, I kind of having doubt if with this two types of parsers the problem would be solved ? also, it is wise to say that with LL parsers we should always prefer right-recursion ? – miracle genuis Jan 07 '19 at 19:32
  • @miraclegenius: Your original grammar would work with a two-token lookahead parser; you should try to understand why that is the case. However, either changing to left-recursion or using the second solution would be much better, since there are no generally available LR(2) parser generators. (GLR would be overkill here.) Top-down parsers cannot handle left-recursion at all, so I would say you should always prefer bottom-up parsers which let you use whichever of left- and right-recursion best suits your grammar :-) – rici Jan 07 '19 at 19:40
  • i got your answer, it was clear. that's why i made the suggestion for LR(2). I was just trying to go deeper in my understanding. Could you point me to some tools for for debugging such conflict and also make some pretty printing or perhaps draw a tree "AST". BTW, i was happy with my second grammar rules but then i saw yours and i used them :D . – miracle genuis Jan 07 '19 at 19:52
  • 1
    @miracle: ok, cool. Bison will generate a file with the state transitions if you invoke it with the `-v` option. That's a pretty good way to see where the conflict is. You can also ask it to produce a .dot file but that's really only useful with toy grammars; real grammars have way too many states. I agree that an automatic AST builder would be useful for students but I'm not so sure that it would be useful for production users. It's really easy to create an AST; the tough part is deciding what information you want to keep in it. – rici Jan 07 '19 at 20:15
  • OK @rici. i will check the verbose option and dot file. Thank you again rici for your explation and the alternative solution. I apperciate. – miracle genuis Jan 07 '19 at 20:37