0
S -> ABCD
A -> ae | af | ag | ah
B -> b | ε
C -> hcd | bcd | cd
D -> e | f | g | h

I've already tried left factorization on 2 and 4 but I'm stuck with | in many of my productions.

double-beep
  • 5,031
  • 17
  • 33
  • 41
  • Welcome to StacOverflow. Your question is completely unclear to me. Are you able to provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) of the problem you're having? – Pedro Rodrigues Mar 24 '19 at 19:59
  • 1
    You need to understand the source of the ambiguity: what part of the grammar is causing it to be ambiguous? Then you can re-express that part in a way that's not ambiguous. – Michael Dyck Mar 25 '19 at 00:19

1 Answers1

0

There are 96 ways to derive a string of terminals in this grammar. We suspect some of these derivations produce redundant strings of terminals, so the number of strings in the generated language is actually less than 96. We would like to arrange it so that each derivation of a string of terminals yields a distinct string.

We could list all 96 derivations, sort them by the derived string, and then figure out how to avoid the ambiguity that way. That would take a little longer than I'd like and we can probably intelligently narrow the search space for duplicate strings substantially by analysis.

We have no choice but to use the production S -> ABCD. Next, we must choose exactly one of the productions A -> ae, A -> af, A -> ag, A -> ah. Still, no ambiguity is possible among the choices so far. Next, we must choose either B -> b or B -> e. Still there is no ambiguity. It is with our choice of the production to remove C that we introduce ambiguity for the first time. The issue is that cd is a suffix of hcd and bcd, and when concatenated to another string might create the suffix hcd or bcd itself. With this in mind, we find the following duplicate derivations:

S -> ABCD -> axBCD -> axbCD -> axbcdD -> axbcy
S -> ABCD -> axBCD -> axCD -> axbcD -> axbcy

Above, x stands in for one of the symbols e, f, g or h; and y stands in for one of the symbols e, f, g or h. The ambiguity arises because we can get the b either from B -> b or from C -> bcd.

Before we proceed, we should rewrite the grammar to eliminate this source of ambiguity; there's no point going farther until we get past this. How can we resolve this? In this case, consider what the grammar might look like if we combined symbols A and B into a new symbol A'. Then the productions would be:

A' -> ae | af | ag | ah | aeb | aef | aeg | aeh

However, we will find the same problem still exists; the problem was not originally between the productions for B and those for A, but between those for B and those for C. We might instead try:

B' -> hcd | bcd | cd | bhcd | bbcd

Notice, crucially, that we only listed five terms above, rather than 6 - because one production, B' -> bcd, is generated twice by combining these adjacent productions. When you see this happen it means you are eliminating ambiguity. Are new grammar looks like:

S  -> ABCD
A  -> ae | af | ag | ah
B' -> cd | hcd | bcd | bhcd | bbcd
C  -> e | f | g | h

We can repeat the analysis from the beginning, here, and find the following:

  • we must choose S -> ABCD
  • we must choose one of A -> ae, A -> af, A -> ag, A -> ah, and no choice introduces ambiguity
  • we must choose one of the productions for B', and these cannot introduce ambiguity because all prefixes to which we're appending have the same base length (2 symbols) and those were unambiguously derived
  • we must choose one of the productions for D, and these cannot introduce ambiguity because all prefixes to which we're appending contain only one instance of d which occurs at the very end, so we can always unambiguously tell where the symbols introduced by the production for B' end and the symbol introduced by the production for D begins
Patrick87
  • 27,682
  • 3
  • 38
  • 73