How to find FIRST and FOLLOW sets of a recursive grammar?

Question

Suppose I have the following CFG.

A -> B | Cx | EPSILON
B -> C | yA
C -> B | w | z

Now if I try to find

FIRST(C) = FIRST(B) U FIRST(w) U FIRST(z)
         = FIRST(C) U FIRST(yA) U {w, z}

That is, I'm going in a loop. Thus I assume I have to convert it into a form which has immediate left recursion, which I can do as follows.

A -> B | Cx | EPSILON
B -> C | yA
C -> C | yA | w | z

Now if I try to calculate FIRST sets, I think I can get it done as follows.

FIRST(C) = FIRST(C) U FIRST(yA) U FIRST(w) U FIRST(z)
         = { y, w, z } // I ignore FIRST(C)
FIRST(B) = FIRST(C) U FIRST(yA)
         = { y, w, z }
FIRST(A) = FIRST(B) U FIRST(Cx) U FIRST(EPSILON)
         = { y, w, z, EPSILON }

Am I correct there?

But even if I'm right there, I still run into a problem when I try to calculate FOLLOW sets from this grammar.

FOLLOW(A) = { $ } U FOLLOW(B) U FOLLOW(C)

I get FOLLOW(B) from 2nd rule and FOLLOW(C) from 3rd rule. But now to calculate FOLLOW(B), I need FOLLOW(A) (from 1st grammar rule) so again I'm stuck in a loop.

Any help? Thanks in advance!

score 15 · Accepted Answer · answered Mar 22 '15 at 22:56

Since FIRST and FOLLOW are (normally) recursive, it's useful to think of them as systems of equations to be solved; the solution can be achieved using a simple incremental algorithm consisting of repeatedly applying all the right hand sides until no set has changed during a cycle.

So let's take the FOLLOW relation for the given grammar:

A → B | Cx | ε
B → C | yA
C → B | w | z

We can directly derive the equations:

FOLLOW(A) = FOLLOW(B) ∪ {$}
FOLLOW(B) = FOLLOW(A) ∪ FOLLOW(C)
FOLLOW(C) = FOLLOW(B) ∪ {x}

So we initially set all the follow sets to {}, and proceed.

First round:

FOLLOW(A) = {} ∪ {$} = {$}
FOLLOW(B) = {$} ∪ {} = {$}
FOLLOW(C) = {$} U {x} = {$,x}

Second round:

FOLLOW(A) = {$} ∪ {$} = {$}
FOLLOW(B) = {$} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Third round:

FOLLOW(A) = {$,x} ∪ {$} = {$,x}
FOLLOW(B) = {$} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Fourth round:

FOLLOW(A) = {$,x} ∪ {$} = {$,x}
FOLLOW(B) = {$,x} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Here we stop because no changes were made in the last round.

This algorithm must terminate because there are a finite number of symbols, and each round can only add symbols to steps. It is not the most efficient technique, although it is generally good enough in practice.

What IS the "most efficient technique" for calculating FIRST? I think it is to start at the goal symbol and recursively follow each RHS, propagating the FiRST sets up the tree. Loop detection gets tricky, though. — David Spector, Jul 19 '20 at 18:04
@david: The most efficient technique is to compute the image of a transitive closure using some variant of Tarjan's algorithm (see [Esko Nuutila, 1994](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.5639) for some techniques to optimise this algorithm, although I'm sure that there is more recent research as well.) (The image of a transitive closure takes a relation R and a function F and computes R*F; as Nuutila points out, the TC algorithm can be quicker if you find the union of image sets rather than finding the union of node sets and then compute the image.) — rici, Jul 19 '20 at 20:09

How to find FIRST and FOLLOW sets of a recursive grammar?

1 Answers1

Linked

Related