How to translate EBNF with multiple non-terminal productions to function calls

Question

I'm learning compiler construction and already managed to create small Python scripts that can interpret simple lines of code. However, I'm struggling with the correct way of implementing EBNF statements that offer choices of non-terminal productions. Let's take this EBNF as an example:

expression ::= term
               | expression '+' term
               | expression '-' term

term       ::= factor
               | term '*' factor
               | term '/' factor

factor     ::= NUMBER
               | '(' expression ')'

This is an EBNF to interpret simple mathematical expressions such as 5 * (3 + 4).

From the compiler literature I understand the base approach of identifying terminal symbols (tokens) with if statements and for non-terminal productions we call sub-functions. With this knowledge I'm able to write the function that interprets factor:

def factor():
    if token.type == 'NUMBER':
        number = token.value
        eat('NUM')
        return number
    elif token.type == '(':
        eat('(')
        expr = self.expression()
        eat(')')
        return expr

What is the recommended way to implement the expression and term non-terminals? I have used a peek() function to look one token ahead:

def expression():
    next_token = peek()
    if token.type in ['NUMBER', '('] and next_token.type == '+':
        expression = expression(token)
        eat('+')
        term = term()
        return (expression, '+', term)
    elif token.type in ['NUMBER', '('] and next_token.type == '-':
        expression = expression(token)
        eat('-')
        term = term()
        return (expression, '-', term)
    elif token.type in ['NUMBER', '(']:
        term = term()
        return term

It feels odd to me that I have to look through two levels of the EBNF (term and factor) to find terminal symbols that I can use to decide which of the choices to make in expression (as in if token.type in ['NUMBER', '('] and next_token.type == '+':). The other thing I'm unsure about is, with the approach above term needs to be tested last. This means the order in which to test for non-terminal productions in an EBNF becomes important. Is this the right way of doing this?

You are (apparently) trying to write a recursive descent parser and your grammar is left-recursive. That doesn't work. If you don't understand either of those terms, they make good search terms. Wikipedia has some useful pages. — rici, Oct 23 '18 at 16:17
I tried to spawn new processes to explore alternatives. Don't remember if it exploded in terms of memory. Threads did not work for me (something about child threads not being able to spawn children), and I think I had some issues with pickling before it worked. — Emil, Nov 21 '18 at 20:02
(NOTE: I think it might not be enough to try to convert the leftmost nonterminal - there might be nonterminals in other places that needs to be converted in order to match a production rule) — Emil, Nov 22 '18 at 20:29

How to translate EBNF with multiple non-terminal productions to function calls

0 Answers0