1

I am learning lexer and parser, so I am reading this classical book : flex & bison (By John Levine, Publisher: O'Reilly Media). An example is given that could not be parsed by bison :

phrase : cart_animal AND CART | work_animal AND PLOW
cart_animal-> HORSE | GOAT
work_animal -> HORSE | OX

I understand very well why it could not. Indeed, it requires TWO symbols of lookahead.

But, with a simple modification, it could be parsed :

phrase : cart_animal CART | work_animal PLOW
cart_animal-> HORSE AND | GOAT AND
work_animal -> HORSE AND | OX AND

I wonder why bison is not able to translate automatically grammar in simple cases like that ?

Stef1611
  • 1,978
  • 2
  • 11
  • 30

1 Answers1

2

Because simple cases like that are pretty well all artificial, and in the case of real-life examples, it is difficult or impossible.

To be clear, if you have an LR(k) grammar with k>1 and you know the value of k, there is a mechanical transformation with which you can make an equivalent LR(1) grammar, and moreover you can, with some juggling, fix the reduction actions so that they have the same effect (at least, as long as they don't contain side effects). I don't know any parser generator which does that, in part because correctly translating the reduction actions will be tricky, and in part because the resulting LR(1) grammar is typically quite large, even for small values of k.

But, as I mentioned above, you need to know the value of k to perform this transformation, and it turns out that there is no algorithm which can take a grammar and tell you whether it is LR(k). So all you could do is try successively larger values of k until you find one which works, or you decide to give up.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for answer. Just for information, as I am beginner on the subject, are there some softwares that could handle LR(2 or more) grammars ? – Stef1611 Apr 18 '19 at 15:32
  • 1
    @Stef1611: Bison (and various other parser generators) can produce GLR parsers, which can handle any context-free grammar, even ambiguous ones (the parser explores all branches in parallel and results in a "forest" of possible parses). But of course there's a catch: the parser is no longer guaranteed to operate in time proportional to the size of the input. However, with many unambiguous grammars, the parser will still work in linear time, albeit with some overhead. Also, as above, actions should not have side effects. Bison can't verify this, so it's the programmer's responsibility. – rici Apr 18 '19 at 15:50
  • 1
    ... There is a GLR algorithm which has worst case O(N³) time, and in theory the exponent can be reduced to some value between 2 and 3 (although it doesn't seem to be very practical). But Bison's implementation prefers to aim at low overhead in frequent cases (and simplicity of implementation), so it can have exponential running time in pathological cases. It should be used with caution on grammars which are highly ambiguous. On the plus side, for grammars which are "mostly LR(1)", its GLR overhead is negligible. – rici Apr 18 '19 at 15:58
  • I have not yet read the chapter concerning GLR. On the web, I read some isolated informations concerning parsing of CFG (LL(0), LR(0), SLR, LALR, LR, LR(k), reduce-shift, ... ). Now, I think to fully understand the problem I need to read complete informations and not disparate one. Do you have an internet link or a book freely available to suggest me ? For example, to fully understand and to be able to demonstrate such things : LL(0) < LL(1) < LL(k). LR(0) < SLR(1) < LALR(1) < LR(1) < LR(k). LL(k) < LR(k) – Stef1611 Apr 18 '19 at 16:23
  • 2
    @stef1611: wikipedia? Also, Parsing Techniques, A Practical Guide. I think the first edition is still downloadable but if you have a budget you could think about the new edition. – rici Apr 18 '19 at 18:10
  • Thank you very much. I will try to found "Parsing Techniques. A practical guide". Concerning Wikipedia, I read some pages. It is where I found some informations but I looking for something more complete. – Stef1611 Apr 18 '19 at 20:07