17

A week ago I started the following project: a grammar which recognizes suffixes of a Java code.

I used the official ANTLR grammar for Java (Java.g4) as a baseline and started to add some rules. However, those new rules also introduced left recursion which I also had to deal with.

After a couple of days of work I had the following code. When I started testing I noticed something unusual which I still can't explain. When given the input { } the parser tells me no viable alternative at input '<EOF>' but when I switch the order of the terminals in the right-handed side of the rule s2, particularly if we change the right-handed side from v2_1 | v2_2 | v2_3 ... to v2_36 | v2_1 | v2_2 ... (the terminal v2_36 is moved to the first position), the sequence { } gets accepted.

My first thoughts were that Antlr does not backtrack because I noticed that with the input { } the first version of the parser starts to follow the rule v2_3 and just reports that nothing is found and does not try to consider other options (that's what I think but maybe it's not true) like v2_36 which give exactly the positive answer.

But, after some research, I found out that ANTLR actually does backtrack but only if everything else fails. At least this is true for v3.3 (read it in official ANTLR paper) but I guess it's also true for v4. Now I'm a little bit confused. After spending so many hours on this project I would feel really terrible if I don't make it work. Can someone gives some kind of tip or something? It would be greatly appreciated, thanks.

EDIT

Managed to isolate the problem to

grammar Java;
@parser::members {String ruleName; }

start : compilationUnitSuf EOF;

compilationUnitSuf
    :   {ruleName = "typeDeclarationSuf"; } s2
    ;

s2: '{' '}' v2_81 | '{' '}';
v2_81 : {ruleName.equals("enumBodyDeclarationsSuf")}? t173 | t173 '}';
t173: '}' | '{'*;

LBRACKET: '{';
RBRACKET: '}';

WS  :  [ \t\r\n\u000C]+ -> skip
    ;

So why does the predicting algorithm suggest me to follow s2 -> v'{' '}' v2_81 -> ... instead of s2 -> '{' '}'?

mjk
  • 2,443
  • 4
  • 33
  • 33
sve
  • 4,336
  • 1
  • 19
  • 30
  • 1
    I have no idea what you mean by _"suffixes of a Java code"_. – Jim Garrison Aug 28 '13 at 20:07
  • If we have the sequence `a[1..n]` of the tokens of a given Java code we define a suffix to be the sequence `a[j], a[j + 1], ..., a[n]` for some `1 <= j <= n` (for the code `class A { int a; }` possible suffixes are `A { int a; }`, `{int a;}`, `int a; }` etc.) but I think this is irrelevant to the question – sve Aug 28 '13 at 20:17
  • 2
    Is there a reason you're using ANTLR? For suffix parsing, a GLR parser would be a lot easier, and it will suffix parse an LR(1) grammar in roughly linear time, iirc. There's a whole chapter about suffix parsing in Grune & Jacobs (Parsing Techniques: A Practical Guide). – rici Aug 29 '13 at 04:00
  • 1
    Thank you for your response. I chose ANTLR mainly because it offers stable working Java grammar. I took a look at the chapter you are mentioning and I noticed that I defined the suffix grammar in same way as it was defined in the book. I'm also not sure if there is existing LR(1) for Java to apply those algorithms. For the moment I have this modified Java grammar which I'm pretty sure it recognizes the suffixes but I'm having a hard time figuring out what's wrong. – sve Aug 29 '13 at 10:20
  • 1
    Are there any error output? Do you get something saying warning multiple alternatives? What about trying: `options {greedy=false;}` for s2, could this help? – efan Sep 08 '13 at 10:16
  • I tried your example with the most recent version of ANTR 4.1 and it worked as it should. The input sequence "{ }" is accepted regardless of the order of the two rules. – Holger Sep 09 '13 at 16:53
  • Sorry, I have obviously messed up the example. For the rule `t173` `+` should be `*`. It's fixed now. – sve Sep 11 '13 at 18:42

1 Answers1

1

I think that you will find that it is not backtracking in the manner that you expect. The reason is that it finds the {} and then expects to see a v2_181, which it doesn't find. because it doesn't then backtrack, it doesn't find the alternative that you want. The alternative is to just make the v2_181 optional, then you don't need the backtracking. Something like below:

grammar Java;
@parser::members {String ruleName; }

start : compilationUnitSuf EOF;

compilationUnitSuf
    :   {ruleName = "typeDeclarationSuf"; } s2
    ;

s2: '{' '}' v2_81?;
v2_81 : {ruleName.equals("enumBodyDeclarationsSuf")}? t173 | t173 '}';
t173: '}' | '{'*;

LBRACKET: '{';
RBRACKET: '}';

WS  :  [ \t\r\n\u000C]+ -> skip
    ;
Paul Wagland
  • 27,756
  • 10
  • 52
  • 74