How can I simplify a recursive-descent parser?

Question

I have the following simple LL(1) grammar, which describes a language with only three valid sentences: "", "x y" and "z x y":

S -> A x y | ε .
A -> z | ε .

I have constructed the following parsing table, and from it a "naive" recursive-descent parser:

  | x          | y | z          | $
S | S -> A x y |   | S -> A x y | S -> ε
A | A -> ε     |   | A -> z     |

func S():
    if next() in ['x', 'z']:
        A()
        expect('x')
        expect('y')
        expect('$')
    elif next() == '$':
        pass
    else:
        error()

func A():
    if next() == 'x':
        pass
    elif next() == 'z':
        expect('z')
    else:
        error()

However, the function A seems to be more complicated than necessary. All of my tests still pass if it's simplified to:

func A():
    if next() == 'z':
        expect('z')

Is this a valid simplification of A? If so, are there any general rules regarding when it's valid to make simplifications like this one?

score 1 · Answer 1 · answered Sep 26 '20 at 15:38

1

That simplification is certainly valid (and quite common).

The main difference is that there is no code associated with the production A→ε. If there are some semantics to implement, you will need to test for the condition. If you only need to ignore the nullable production, you can certainly just return.

Coalescing errors and epsilon productions has one other difference: the error (for example, in the input y) is detected later, after A() returns. Sometimes that makes it harder to produce good error messages (and sometimes it doesn't).

answered Sep 26 '20 at 15:38

rici

234,347
28
237
341

Thank you for answering. If I only need to build a recognizer for a language (and I'm working with an LL(1) grammar), is it safe to perform this coalescing of errors and epsilon productions throughout? Or is it only valid in certain places? Also, are there any other common simplifications I should be aware of? – user200783 Sep 27 '20 at 06:24
@user200783: yes, it's safe if you know the language is LL(1). Another common optimisation is unit rule elimination. – rici Sep 27 '20 at 07:12

How can I simplify a recursive-descent parser?

1 Answers1