1

I am trying to understand, how bison builds tables for this simple grammar:

input: rule ;
rule: rule '+' '1' 
    | '1' ;

I was able to calculate LR(1) transition table and item sets, but I don't understand how state 3 is built and works:

State 3

1 input: rule .  [$end]
2 rule: rule . '+' '1'

'+'  shift, and go to state 5

$default  reduce using rule 1 (input)

For default reduce rule I should put 'r1' into all cells of GOTO table for each symbol. But for shift rule I should put 's5' into column for '+' terminal (this cell already contains 'r1'). For me this looks like shift/reduce conflict. But not for bison. Please explain how that '[$end]' lookahead symbol appeared in this state, and how this state is processed in overall by LR state machine.

xsp-server-hater
  • 55
  • 1
  • 1
  • 9

1 Answers1

1
  1. default means "everything else", not "everything". In other words, you first fill in the specified actions, and then use the default action for any other lookahead symbol.

    If there is no default action, the action for any unspecified lookahead symbol is an error. Default reduce actions are often used to reduce table size where the algorithm would otherwise trigger an error. This optimization may have the result of delaying the reporting of an error, but the error will always be detected before another input is consumed, precisely because an error action is never replaced with a default shift. (Indeed, many parser generators never use default shift actions at all.)

  2. If you look at the grammar shown at the beginning of the .output file, you'll see that it has been augmented with the production:

    0 $accept: input $end
    

    Yacc/bison always adds a production like that to ensure that the complete input matches the start symbol. (The non-terminal input will, of course, be replaced by whatever start symbol has been declared with the %start directive, or with the first non-terminal in the grammar.)

    There is nothing special about this rule aside from the fact that reducing the $accept symbol causes the input to be accepted. (You can see that in state 4).

    $end is a special terminal symbol automatically generated when EOF is detected. (To be more precise, it is the terminal whose token value is 0, which the scanner returns to indicate end of file: (f)lex-generated scanners do this automatically.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    I don't understand how the last part of extended Item is calculated. I.e. how that $end symbol is propagated from rule 0 into state 3 (it's the essence of LALR and I don't understand it :( ). It is described in papers like - http://www.larc.usp.br/~pbarreto/LR.pdf but it's hard for me to understand – xsp-server-hater Sep 08 '17 at 17:25
  • how LOOKAHEAD constructed for separate productions - https://stackoverflow.com/a/13731963/4158543 – xsp-server-hater Sep 08 '17 at 19:57
  • @xsp: the set called LOOKAHEAD in that answer has absolutely nothing to do with the use of lookahead in LR parsing. I don't think it's appropriate for LL parsing either, but that's a different issues. – rici Sep 08 '17 at 22:26
  • How this LOOKAHEAD is different from [that](https://stackoverflow.com/a/37633332/4158543) lookahead set - ? – xsp-server-hater Sep 09 '17 at 04:36
  • @xsp: That "lookahead" set is used to predict which of a non-terminals productions should be predicted, and therefore contains the possible first terminals for the production. The LR LOOKAHEAD set is used to decide whether or not to reduce a production, and therefore contains the possible terminals which can follow the reduced production *in the current context*. – rici Sep 09 '17 at 04:39
  • 1
    @xsp: The LALR(1) lookahead computation is well described in the [Dragon Book](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools), which is a standard reference for compiler construction. I recommend it. However, another good text is Dick Grune's Parsing Techniques: A practical guide, whose first edition is [available for download](https://dickgrune.com/Books/PTAPG_1st_Edition/). I don't think I can explain the concepts as well as though two books, so I'm not going to try. – rici Sep 09 '17 at 20:28