A week ago I started the following project: a grammar which recognizes suffixes of a Java code.
I used the official ANTLR
grammar for Java (Java.g4
) as a baseline and started to add some rules. However, those new rules also introduced left recursion which I also had to deal with.
After a couple of days of work I had the following code. When I started testing I noticed something unusual which I still can't explain. When given the input { }
the parser tells me no viable alternative at input '<EOF>'
but when I switch the order of the terminals in the right-handed side of the rule s2
, particularly if we change the right-handed side from v2_1 | v2_2 | v2_3 ...
to v2_36 | v2_1 | v2_2 ...
(the terminal v2_36
is moved to the first position), the sequence { }
gets accepted.
My first thoughts were that Antlr
does not backtrack because I noticed that with the input { }
the first version of the parser starts to follow the rule v2_3
and just reports that nothing is found and does not try to consider other options (that's what I think but maybe it's not true) like v2_36
which give exactly the positive answer.
But, after some research, I found out that ANTLR
actually does backtrack but only if everything else fails. At least this is true for v3.3 (read it in official ANTLR
paper) but I guess it's also true for v4
. Now I'm a little bit confused. After spending so many hours on this project I would feel really terrible if I don't make it work. Can someone gives some kind of tip or something? It would be greatly appreciated, thanks.
EDIT
Managed to isolate the problem to
grammar Java;
@parser::members {String ruleName; }
start : compilationUnitSuf EOF;
compilationUnitSuf
: {ruleName = "typeDeclarationSuf"; } s2
;
s2: '{' '}' v2_81 | '{' '}';
v2_81 : {ruleName.equals("enumBodyDeclarationsSuf")}? t173 | t173 '}';
t173: '}' | '{'*;
LBRACKET: '{';
RBRACKET: '}';
WS : [ \t\r\n\u000C]+ -> skip
;
So why does the predicting algorithm suggest me to follow s2 -> v'{' '}' v2_81 -> ...
instead of s2 -> '{' '}'
?