0

I've been trying to calculate the follow set of a grammar for some time now, and have run into yet another problem. Here is my follow set calculator:

def gen_follow_set(grammar, start_sym, first_sets):
    follow_sets = {nterm: set() for nterm in grammar}
    follow_sets[start_sym].add("$")
    for _, prods in grammar.items():
        for alt in prods:
            for item in alt:
                if item.isupper():
                    follow_sets[item] = set()
    while True:
        changes = copy.deepcopy(follow_sets)
        for nterm, prods in grammar.items():
            for alt in prods:
                for i, item in enumerate(alt):
                    la = alt[i + 1] if i + 1 != len(alt) else nterm
                    if i == len(alt) - 1 and item != "":
                        follow_sets[item] |= follow_sets[nterm]
                    elif item != "":
                        if "" in first_sets[la]:
                            follow_sets[item] |= first_sets[la].union(
                                first_sets[alt[i + 2] if i + 2 <= len(alt) -
                                           1 else nterm]) - {""}
                        else:
                            follow_sets[item] |= first_sets[la]
        if changes == follow_sets:
            return follow_sets

This is called like this:

grammar = {
    "expr": [["term", "etail"]],
    "term": [["LPAREN", "expr", "RPAREN"], ["INT", "ttail"]],
    "etail": [["PLUS", "expr"], [""]],
    "ttail": [["TIMES", "term"], [""]]
}
first = calc_first_set(...)
pprint.pprint(gen_follow_set(grammar, "expr", first))

This outputs:

Working on: term ; la has epsilon; la: etail
Working on: etail ; it is at the end of the production: expr
Working on: LPAREN ; la doesn't have epsilon; la: expr
Working on: expr ; la doesn't have epsilon; la: RPAREN
Working on: RPAREN ; it is at the end of the production: term
Working on: INT ; la has epsilon; la: ttail
Working on: ttail ; it is at the end of the production: term
Working on: PLUS ; la doesn't have epsilon; la: expr
Working on: expr ; it is at the end of the production: etail
Working on: TIMES ; la doesn't have epsilon; la: term
Working on: term ; it is at the end of the production: ttail
Working on: term ; la has epsilon; la: etail
Working on: etail ; it is at the end of the production: expr
Working on: LPAREN ; la doesn't have epsilon; la: expr
Working on: expr ; la doesn't have epsilon; la: RPAREN
Working on: RPAREN ; it is at the end of the production: term
Working on: INT ; la has epsilon; la: ttail
Working on: ttail ; it is at the end of the production: term
Working on: PLUS ; la doesn't have epsilon; la: expr
Working on: expr ; it is at the end of the production: etail
Working on: TIMES ; la doesn't have epsilon; la: term
Working on: term ; it is at the end of the production: ttail
Working on: term ; la has epsilon; la: etail
Working on: etail ; it is at the end of the production: expr
Working on: LPAREN ; la doesn't have epsilon; la: expr
Working on: expr ; la doesn't have epsilon; la: RPAREN
Working on: RPAREN ; it is at the end of the production: term
Working on: INT ; la has epsilon; la: ttail
Working on: ttail ; it is at the end of the production: term
Working on: PLUS ; la doesn't have epsilon; la: expr
Working on: expr ; it is at the end of the production: etail
Working on: TIMES ; la doesn't have epsilon; la: term
Working on: term ; it is at the end of the production: ttail
{'INT': {'INT', 'TIMES', 'LPAREN'},
 'LPAREN': {'INT', 'LPAREN'},
 'PLUS': {'INT', 'LPAREN'},
 'RPAREN': {'INT', 'LPAREN', 'PLUS'},
 'TIMES': {'INT', 'LPAREN'},
 'etail': {'$', 'RPAREN'},
 'expr': {'$', 'RPAREN'},
 'term': {'INT', 'LPAREN', 'PLUS'},
 'ttail': {'INT', 'LPAREN', 'PLUS'}}

etail and expr are correct, but term and ttail aren't correct. How can I make the get the correct answer?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
xilpex
  • 3,097
  • 2
  • 14
  • 45
  • Please reduce and enhance this into the expected [MRE](https://stackoverflow.com/help/minimal-reproducible-example). Show where the intermediate results deviate from the ones you expect. You're close, but we need a bit more explanation *and* that reproducible code. Don't ask us to sight-check your algorithm. – Prune Apr 28 '20 at 06:02

1 Answers1

2

Whenever the non-terminal N appears in a production

M → α N β

We have

  1. FIRST(α) &subset; FOLLOW(N)

  2. If β is nullable, then FOLLOW(M) &subset; FOLLOW(N)

Your code works correctly if β is empty (i.e. N is at the end of the production) or if the first symbol in β is not nullable. In the remaining cases, your code has errors:

  • If the first symbol in β is nullable, you compute FIRST(β) as the union of the FIRST sets of the first two symbols in β. Since you never check whether the second (or subsequent) symbols are nullable, you might miss symbols in FIRST(β).

  • Another result of only testing nullability of the next symbol is that you don't compute NULLABLE(β); instead you use the nullability of the first symbol in β. So you might miss symbols in FOLLOW(M).

I don't believe either of those bugs are triggered by your actual grammar. But the next one is;

  • In the case that your (insufficient) test reveals that β is nullable, you use FIRST(M) instead of FOLLOW(M).

  • A closely related problem is the computation of la which proposes term as the next symbol if the end of the production has been reached. That would lead to using FIRST(term) rather than FOLLOW(term), but of course that ca n never happen since the only code branch which uses la does not execute if N is at the end of the production. That being the case, la is actually unnecessary.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the great answer! Forgive my ignorance, but I don't understand the last point. For example, why is `la` unnecessary? And maybe how I could fix it. Thanks! – xilpex Apr 28 '20 at 19:29
  • @Xilpex: I guess it's not quite accurate to say that `la` is unnecessary. What's unnecessary is the conditional `la = alt[i + 1] if i + 1 != len(alt) else nterm`. If the `else` clause is chosen, `i + 1 == len(alt)` and in that case the following `if` statement will fire and `la` will never get used. So `la` can only be used if its value is `alt[i + 1]`, so you might as well just write `alt[i + 1]` instead of precomputing `la` and then quite possibly not using it. – rici Apr 28 '20 at 19:47
  • But no simple fix can be applied if you want to solve the first problems I mentioned. You need to rejig your computation. Personally, I find it easier to ask `which follow set(s) should this first set be added to?" than to ask "which first sets should be added to this follow set?" – rici Apr 28 '20 at 19:50
  • Ok, cool, I fixed the `la` thing-- but it still doesn't get the correct FOLLOW set. My term FOLLOW set is currently (after being fix): `{'$', 'PLUS', 'RPAREN'}`. – xilpex Apr 28 '20 at 19:52
  • @xilpex: follow set for what? And what do you expect? – rici Apr 28 '20 at 20:02
  • FOLLOW set for `term` and `ttail`. I expect `{+, @}` to be their follow sets. – xilpex Apr 28 '20 at 20:03
  • @xilpex: @? Do you mean $? Anyway, it's clear that term can be followed by `)` – rici Apr 28 '20 at 20:06
  • I meant epsilon. Here is where I am getting my grammar checked: https://pastebin.com/pyJS7Jn4 – xilpex Apr 28 '20 at 20:24
  • @xilpex: Epsilon is never in a follow set, so that tool is flawed. You can test a followset by hand for a simple grammar. The question is simple: is there a derivation where symbol `a` follows non-terminal `N`. (In fact, putting ε into first sets instead of using a separate `nullable` predicate is a hack, which has no place in the underlying mathematical formalism.) – rici Apr 28 '20 at 20:36
  • Oh, ok. So `{'PLUS', '$', 'RPAREN'}` as `term`'s follow set is correct? – xilpex Apr 28 '20 at 21:21
  • @xilpex: `term` and `ttail` certainly have the same FOLLOW set. Other than that, `term` can appear before `ttail` or (because `ttail` is nullable) at the end of `expr`. `ttail` can start with `PLUS` and `expr` can be followed by `RPAREN` and `$`. So that's the FOLLOW set for `term` afaics. There's no magic to FIRST and FOLLOW sets. They just represent what they claim to represent: symbols which can start or follow the non-terminal. – rici Apr 28 '20 at 21:43