How to recognize extra input error in ANTLR?

Question

Here is the grammar of simple arithmetic expression in ANTLR. I would like to get the parsing tree for a simple arithmetic expression.

grammar LabeledExpr; // rename to distinguish from Expr.g4

prog:   stat+ ;

stat:   expr NEWLINE                # printExpr
    |   ID '=' expr NEWLINE         # assign
    |   NEWLINE                     # blank
    ;

expr:   expr op=('*'|'/') expr      # MulDiv
    |   expr op=('+'|'-') expr      # AddSub
    |   INT                         # int
    |   ID                          # id
    |   '(' expr ')'                # parens
    ;

MUL :   '*' ; // assigns token name to '*' used above in grammar
DIV :   '/' ;
ADD :   '+' ;
SUB :   '-' ;
ID  :   [a-zA-Z]+ ;      // match identifiers
INT :   [0-9]+ ;         // match integers
NEWLINE:'\r'? '\n' ;     // return newlines to parser (is end-statement 
signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace

Now I input (3+5)*4, ANTLR generates the parsing tree of the expression correctly. However, if I input (3+5)4, which is not a valid input, I also get no errors and a parsing tree. From the output, it seems that only (3+5) is accepted.

I also noticed some similar cases that, if some matches have been found in the input, the remaining input will be neglected. For example, I defined a grammar:

relation_op : LESS_THAN | LEQ | GREATER_THAN | GEQ | EQUAL |
              DOUBLE_EQUAL | NEQ; 
              //Capital letters are predefined symbols(<,>,=...)

Then I input <dskjkdsd, the parsing tree for < will be displayed correctly with extra wrong input dskjkdsd ignored.

So what went wrong in this?

score 2 · Accepted Answer · answered Dec 17 '18 at 13:33

By default, a rule matches as much of the input as it can and then leaves the rest in the token stream. So when you feed the input (3+5)4 to the prog rule, you'll notice that the token 4 will still be sitting in the token stream afterwards. So you could theoretically call another rule that then consumes it.

When you don't want that behavior (which you usually don't for rules that you invoke from your main code), you can add EOF to the end of the rule to signify that it must always match until the end of file and produce an error if it can't.

So you'll get the errors you expect when you change your prog rule to:

prog: stat+ EOF ;

How to recognize extra input error in ANTLR?

1 Answers1