5

As I understand, in the following case left factoring is required to build top-down parser. But it's hard to understand how to do that? Can someone help me here? Thanks.

s = a | b
b = c d
c = (e | f) g
e = a | h
Endophage
  • 21,038
  • 13
  • 59
  • 90
Bee
  • 12,251
  • 11
  • 46
  • 73
  • Well, it depends on what the productions `a`, `d`, `f`, `g` and `h` are. If they're "simple" terminals, no left factoring is needed, AFAIK. – Bart Kiers Mar 02 '12 at 20:54
  • @BartKiers : Did you notice that in my example, `b also contains a in its left if you go through b->c->e->a` ? that means it could be `s = a | a + something`. Do you still say left factoring is not required? Thanks. – Bee Mar 02 '12 at 20:58
  • @Bhathiya: Left factoring is applied to transform the grammar so that control can't loop without consuming any tokens, which would lead to an endless loop when parsing. That's not the case here. The issue here is that this grammar can't be parsed with an LL(1) (single-symbol look-ahead) parser. – 500 - Internal Server Error Mar 02 '12 at 21:27
  • @500-InternalServerError: What is the solution you suggest to avoid the loop scenario in this case? – Bee Mar 03 '12 at 03:30

1 Answers1

6

Every non-terminal is only referenced once here, so we can pull the entire grammar together in a single expression:

s = a | ((a | h | f) g d)

So we have two basic variations, the terminal a optionally followed by g then d, or else one of h or f always followed by g then d.

So we have

s =  b' | c'
b' = a | a g d
c' = (h | f) g d

or, pulling the common g d sequence into its own production

s =  b' | c'
b' = a | a e'
c' = (h | f) e'
e' = g d

We can then pull a up as a common starting symbol in b' by introducing an E (empty) option:

s =  b'' | c'
b'' = a (e' | E)
c' = (h | f) e'
e' = g d

The grammar is now unambiguous.