0

Why do we add a new start state S0 -> S when we want to convert a grammar to Chomsky normal form? What goes wrong if we do not do that?

At first I thought it's because of epsilon rules. But we do not remove an epsilon rule from start variable. So, what is benefit of adding S0 -> S?

Thanks

A. Mashreghi
  • 1,729
  • 2
  • 20
  • 33

2 Answers2

1

Depending on whether the empty string is in the language you might have the rule $S --> \epsilon$ (or $S_0 --> \epsilon$). This could delete an arbitrary number of symbols $S$ if these could appear on the right hand sides of rules. Because we do not want the start symbol to appear again, we introduce a new one.

This way we get exactly one more symbol per application of a rule A -> BC.

Peter Leupold
  • 1,162
  • 1
  • 9
  • 16
  • I don't think it makes a problem because even if you have the rule S ---> \epsilon, you won't remove it since an epsilon rule is deleted only if its variable is not the start symbol. – A. Mashreghi Jun 04 '16 at 22:25
  • 1
    The point is that with CNF you know that the derivation of a string of length n has n-1 rules A->BC and n of the form A->a. The grammar S->A, A->AS, A->a, S->eps could derive the string a in an arbitrarily large number of ways. This is not what you desire from a **normal form**. – Peter Leupold Jun 05 '16 at 07:55
0

I think I have some explanation. If a grammar is like this:

S --> S1
S1 --> S
S1 --> a

Then, at the step of removing "unit rules" since we do not consider any specific order, we might remove S --> S1 first and we will have:

S1 --> S1
S1 --> a

and the start variable is entirely removed.

A. Mashreghi
  • 1,729
  • 2
  • 20
  • 33