1

When transforming a Context-free grammar into Chomsky Normal Form, we first remove null-productions, then unit-productions and then useless productions in this exact order. I understand that removing null-productions could give raise to unit-productions, that’s why unit is removed after null-productions. I do however not understand what could go wrong if we first removed useless-productions and then unit?

  • @rici Thank you. It was my professor who said it during a lecture. I probably didn’t hear correctly what he said. – Tryer outer Jun 19 '22 at 20:03
  • So you claim that I can remove useless productions at the very beginning and the two other removals won’t cause any useless productions? It is the case that I only want to do the process once, so no removal of the same kind should happen twice. – Tryer outer Jun 19 '22 at 20:06
  • The definition of useless productions in the book we are using is: productions from a grammar that can never take part in any derivation to a terminal string. Are your arguments still valid with this definition? – Tryer outer Jun 19 '22 at 22:12
  • A duplicate production could take part in a derivation. An unreachable or ungenerative (some say unproductive) production cannot. If a production is reachable, then you can get to it from the start symbol; if it's generative, you can get to some word from it. So if it's reachable and generative, it's in the derivation for some word. Same definition. – rici Jun 19 '22 at 23:42

1 Answers1

1

If you remove the unit production A → B and that was the only place in the grammar where B was referenced, then B will become unreachable as a result of unit-production elimination, and will need to be removed along with its productions.

That condition requires B to be non-recursive (since a recursive non-terminal refers to itself, and presumably not with a unit production), and any non-terminals referenced in the productions of B will still be referenced, having been absorbed into productions for A.

If the grammar does not have a cycle of unit productions allowing A →* A, then unit productions can be topologically sorted and removed in reverse topological order, which guarantees that the unit production elimination doesn't create a new unit production. That makes it possible to remove newly-unreachable non-terminals as you do the unit-production elimination. But I think that textbook algorithms probably don't do that, which is presumably why your textbook wants you to remove useless productions after the grammar has been converted to CNF. (And, of course, there's nothing stopping a grammar from having a cycle of unit productions. Such a grammar would be ambiguous, making it difficult to use in a parser, but this exercise doesn't require that the grammar be useful in a parser.)

Similarly, if the only production for a non-terminal is an ε-production, then that non-terminal will end up with no productions after null-productions are removed (and it will also be unreachable). Again, that could be handled in a way which doesn't require deferring reachability analysis, but the textbook algorithm probably doesn't do that.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thank you! But when I remove the unit production A-> B, then I have to replace everywhere A by B such that B will always keep being referenced? Could you maybe add a little example to your answer? – Tryer outer Jun 20 '22 at 11:21
  • 1
    @tryer: I see that Peter Linz uses a different algorithm for removing unit productions. I'll see if I can fix the answer to be consistent with both algorithms. – rici Jun 20 '22 at 16:52