S -> ABCD
A -> ae | af | ag | ah
B -> b | ε
C -> hcd | bcd | cd
D -> e | f | g | h
I've already tried left factorization on 2 and 4 but I'm stuck with |
in many of my productions.
S -> ABCD
A -> ae | af | ag | ah
B -> b | ε
C -> hcd | bcd | cd
D -> e | f | g | h
I've already tried left factorization on 2 and 4 but I'm stuck with |
in many of my productions.
There are 96 ways to derive a string of terminals in this grammar. We suspect some of these derivations produce redundant strings of terminals, so the number of strings in the generated language is actually less than 96. We would like to arrange it so that each derivation of a string of terminals yields a distinct string.
We could list all 96 derivations, sort them by the derived string, and then figure out how to avoid the ambiguity that way. That would take a little longer than I'd like and we can probably intelligently narrow the search space for duplicate strings substantially by analysis.
We have no choice but to use the production S -> ABCD. Next, we must choose exactly one of the productions A -> ae, A -> af, A -> ag, A -> ah. Still, no ambiguity is possible among the choices so far. Next, we must choose either B -> b or B -> e. Still there is no ambiguity. It is with our choice of the production to remove C that we introduce ambiguity for the first time. The issue is that cd is a suffix of hcd and bcd, and when concatenated to another string might create the suffix hcd or bcd itself. With this in mind, we find the following duplicate derivations:
S -> ABCD -> axBCD -> axbCD -> axbcdD -> axbcy
S -> ABCD -> axBCD -> axCD -> axbcD -> axbcy
Above, x stands in for one of the symbols e, f, g or h; and y stands in for one of the symbols e, f, g or h. The ambiguity arises because we can get the b either from B -> b or from C -> bcd.
Before we proceed, we should rewrite the grammar to eliminate this source of ambiguity; there's no point going farther until we get past this. How can we resolve this? In this case, consider what the grammar might look like if we combined symbols A and B into a new symbol A'. Then the productions would be:
A' -> ae | af | ag | ah | aeb | aef | aeg | aeh
However, we will find the same problem still exists; the problem was not originally between the productions for B and those for A, but between those for B and those for C. We might instead try:
B' -> hcd | bcd | cd | bhcd | bbcd
Notice, crucially, that we only listed five terms above, rather than 6 - because one production, B' -> bcd, is generated twice by combining these adjacent productions. When you see this happen it means you are eliminating ambiguity. Are new grammar looks like:
S -> ABCD
A -> ae | af | ag | ah
B' -> cd | hcd | bcd | bhcd | bbcd
C -> e | f | g | h
We can repeat the analysis from the beginning, here, and find the following: