Prevent left recursion in ANTLR 4 from matching invalid inputs

Question

I am making a simple programming language. It has the following grammar:

program: declaration+;

declaration: varDeclaration
           | statement
           ;

varDeclaration: 'var' IDENTIFIER ('=' expression)?';';
statement: exprStmt
         | assertStmt
         | printStmt
         | block
         ;

exprStmt: expression';';
assertStmt: 'assert' expression';';
printStmt: 'print' expression';';
block: '{' declaration* '}';

//expression without left recursion
/*
expression: assignment
          ;

assignment: IDENTIFIER '=' assignment
          | equality;

equality: comparison (op=('==' | '!=') comparison)*;

comparison: addition  (op=('>' | '>=' | '<' | '<=') addition)* ;

addition: multiplication (op=('-' | '+') multiplication)* ;

multiplication: unary (op=( '/' | '*' ) unary )* ;

unary: op=( '!' | '-' ) unary
     | primary
     ;
*/

//expression with left recursion
expression: IDENTIFIER '=' expression
          | expression op=('==' | '!=') expression
          | expression op=('>' | '>=' | '<' | '<=') expression
          | expression op=('-' | '+') expression
          | expression op=( '/' | '*' ) expression
          | op=( '!' | '-' ) expression
          | primary
          ;

primary: intLiteral
       | booleanLiteral
       | stringLiteral
       | identifier
       | group
       ;

intLiteral: NUMBER;
booleanLiteral: value=('True' | 'False');
stringLiteral: STRING;
identifier: IDENTIFIER;
group: '(' expression ')';

TRUE: 'True';
FALSE: 'False';
NUMBER:   [0-9]+ ;
STRING: '"' ~('\n'|'"')* '"' ;
IDENTIFIER :   [a-zA-Z]+ ;

This left recursive grammar is useful because it ensures every node in the parse tree has at most 2 children. For example, var a = 1 + 2 + 3 will turn into two nested addition expressions, rather than one addition expression with three children. That behavior is useful because it makes writing an interpreter easy, since I can just do (highly simplified):

public Object visitAddition(AdditionContext ctx) {
    return visit(ctx.addition(0)) + visit(ctx.addition(1));
}

instead of iterating through all the child nodes.

However, this left recursive grammar has one flaw, which is that it accepts invalid statements. For example:

var a = 3;
var b = 4;
a = b == b = a;

is valid under this grammar even though the expected behavior would be

b == b is parsed first since == has higher precedence than assignment (=).
Because b == b is parsed first, the expression becomes incoherent. Parsing fails.

Instead, the following undesired behavior occurs: the final line is parsed as (a = b) == (b = a).

How can I prevent left recursion from parsing incoherent statements, such as a = b == b = a?

The non-left-recursive grammar recognizes this input is correct and throws a parsing error, which is the desired behavior.

Do you want `(a = b) == (b = a)` to be legal? If not, you can just make assignment a statement instead of an expression. — sepp2k, May 16 '19 at 07:15
Btw: You actually have the precedence the wrong way around. The alternatives with the *highest* precedence have to come first. Fixing that won't prevent `a = b == b = a` from being accepted though. — sepp2k, May 16 '19 at 13:35

Prevent left recursion in ANTLR 4 from matching invalid inputs

0 Answers0