I'm learning compiler construction and already managed to create small Python scripts that can interpret simple lines of code. However, I'm struggling with the correct way of implementing EBNF statements that offer choices of non-terminal productions. Let's take this EBNF as an example:
expression ::= term
| expression '+' term
| expression '-' term
term ::= factor
| term '*' factor
| term '/' factor
factor ::= NUMBER
| '(' expression ')'
This is an EBNF to interpret simple mathematical expressions such as 5 * (3 + 4).
From the compiler literature I understand the base approach of identifying terminal symbols (tokens) with if
statements and for non-terminal productions we call sub-functions. With this knowledge I'm able to write the function that interprets factor
:
def factor():
if token.type == 'NUMBER':
number = token.value
eat('NUM')
return number
elif token.type == '(':
eat('(')
expr = self.expression()
eat(')')
return expr
What is the recommended way to implement the expression
and term
non-terminals? I have used a peek()
function to look one token ahead:
def expression():
next_token = peek()
if token.type in ['NUMBER', '('] and next_token.type == '+':
expression = expression(token)
eat('+')
term = term()
return (expression, '+', term)
elif token.type in ['NUMBER', '('] and next_token.type == '-':
expression = expression(token)
eat('-')
term = term()
return (expression, '-', term)
elif token.type in ['NUMBER', '(']:
term = term()
return term
It feels odd to me that I have to look through two levels of the EBNF (term
and factor
) to find terminal symbols that I can use to decide which of the choices to make in expression
(as in if token.type in ['NUMBER', '('] and next_token.type == '+':
). The other thing I'm unsure about is, with the approach above term
needs to be tested last. This means the order in which to test for non-terminal productions in an EBNF becomes important. Is this the right way of doing this?