12

Suppose I have the following CFG.

A -> B | Cx | EPSILON
B -> C | yA
C -> B | w | z

Now if I try to find

FIRST(C) = FIRST(B) U FIRST(w) U FIRST(z)
         = FIRST(C) U FIRST(yA) U {w, z}

That is, I'm going in a loop. Thus I assume I have to convert it into a form which has immediate left recursion, which I can do as follows.

A -> B | Cx | EPSILON
B -> C | yA
C -> C | yA | w | z

Now if I try to calculate FIRST sets, I think I can get it done as follows.

FIRST(C) = FIRST(C) U FIRST(yA) U FIRST(w) U FIRST(z)
         = { y, w, z } // I ignore FIRST(C)
FIRST(B) = FIRST(C) U FIRST(yA)
         = { y, w, z }
FIRST(A) = FIRST(B) U FIRST(Cx) U FIRST(EPSILON)
         = { y, w, z, EPSILON }

Am I correct there?

But even if I'm right there, I still run into a problem when I try to calculate FOLLOW sets from this grammar.

FOLLOW(A) = { $ } U FOLLOW(B) U FOLLOW(C)

I get FOLLOW(B) from 2nd rule and FOLLOW(C) from 3rd rule. But now to calculate FOLLOW(B), I need FOLLOW(A) (from 1st grammar rule) so again I'm stuck in a loop.

Any help? Thanks in advance!

Qantas 94 Heavy
  • 15,750
  • 31
  • 68
  • 83
Sach
  • 10,091
  • 8
  • 47
  • 84

1 Answers1

15

Since FIRST and FOLLOW are (normally) recursive, it's useful to think of them as systems of equations to be solved; the solution can be achieved using a simple incremental algorithm consisting of repeatedly applying all the right hand sides until no set has changed during a cycle.

So let's take the FOLLOW relation for the given grammar:

A → B | Cx | ε
B → C | yA
C → B | w | z

We can directly derive the equations:

FOLLOW(A) = FOLLOW(B) ∪ {$}
FOLLOW(B) = FOLLOW(A) ∪ FOLLOW(C)
FOLLOW(C) = FOLLOW(B) ∪ {x}

So we initially set all the follow sets to {}, and proceed.

First round:

FOLLOW(A) = {} ∪ {$} = {$}
FOLLOW(B) = {$} ∪ {} = {$}
FOLLOW(C) = {$} U {x} = {$,x}

Second round:

FOLLOW(A) = {$} ∪ {$} = {$}
FOLLOW(B) = {$} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Third round:

FOLLOW(A) = {$,x} ∪ {$} = {$,x}
FOLLOW(B) = {$} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Fourth round:

FOLLOW(A) = {$,x} ∪ {$} = {$,x}
FOLLOW(B) = {$,x} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Here we stop because no changes were made in the last round.

This algorithm must terminate because there are a finite number of symbols, and each round can only add symbols to steps. It is not the most efficient technique, although it is generally good enough in practice.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks bro, this helps me a lot! – Jiahao Cai Jul 02 '17 at 11:06
  • Thanks a lot . Really helped me ! – Zephyr Aug 08 '17 at 10:09
  • What IS the "most efficient technique" for calculating FIRST? I think it is to start at the goal symbol and recursively follow each RHS, propagating the FiRST sets up the tree. Loop detection gets tricky, though. – David Spector Jul 19 '20 at 18:04
  • 1
    @david: The most efficient technique is to compute the image of a transitive closure using some variant of Tarjan's algorithm (see [Esko Nuutila, 1994](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.5639) for some techniques to optimise this algorithm, although I'm sure that there is more recent research as well.) (The image of a transitive closure takes a relation R and a function F and computes R*F; as Nuutila points out, the TC algorithm can be quicker if you find the union of image sets rather than finding the union of node sets and then compute the image.) – rici Jul 19 '20 at 20:09