Chomsky Normal Form Conversion Algorithm

Question

Why do we add a new start state S0 -> S when we want to convert a grammar to Chomsky normal form? What goes wrong if we do not do that?

At first I thought it's because of epsilon rules. But we do not remove an epsilon rule from start variable. So, what is benefit of adding S0 -> S?

Thanks

score 1 · Accepted Answer · answered Jun 03 '16 at 15:24

1

Depending on whether the empty string is in the language you might have the rule $S --> \epsilon$ (or $S_0 --> \epsilon$). This could delete an arbitrary number of symbols $S$ if these could appear on the right hand sides of rules. Because we do not want the start symbol to appear again, we introduce a new one.

This way we get exactly one more symbol per application of a rule A -> BC.

answered Jun 03 '16 at 15:24

Peter Leupold

1,162
1
9
16

I don't think it makes a problem because even if you have the rule S ---> \epsilon, you won't remove it since an epsilon rule is deleted only if its variable is not the start symbol. – A. Mashreghi Jun 04 '16 at 22:25
1

The point is that with CNF you know that the derivation of a string of length n has n-1 rules A->BC and n of the form A->a. The grammar S->A, A->AS, A->a, S->eps could derive the string a in an arbitrarily large number of ways. This is not what you desire from a **normal form**. – Peter Leupold Jun 05 '16 at 07:55

score 0 · Answer 2 · answered Jun 01 '16 at 02:41

I think I have some explanation. If a grammar is like this:

S --> S1
S1 --> S
S1 --> a

Then, at the step of removing "unit rules" since we do not consider any specific order, we might remove S --> S1 first and we will have:

S1 --> S1
S1 --> a

and the start variable is entirely removed.

Chomsky Normal Form Conversion Algorithm

2 Answers2