Antlr3 behaves differently in 2 different rules when the rule's intent is effectively same

Question

While working with Antlr3 grammar, I have come across a situation when the rule's intent is effectively same but behaves differently.

I have created a small example-

I want to parse a qualified object name which may be 3-part or 2-part or unqualified (Dot is the separator).

Test Input-
1. SCH.LIB.TAB1;
2. LIB.TAB1;
3. TAB1;

I changed the below rule from having optionals to having alternatives (ORed rules).

Before State-
qualified_object_name
:  
    ( identifier ( ( DOT identifier )? DOT identifier )? )
;

After State-
qualified_object_name_new
:  
    ( identifier DOT  identifier DOT identifier )  // 3 part name
    | ( identifier DOT identifier )                // 2 part name
    | ( identifier )                               // 1 part name
;

Input 1 is parsed correctly by both the rules, but the new rule gives error while parsing input 2 and 3.

line 1:22 no viable alternative at input ';'

I assumed that Antlr will try to match against alternative 1 of qualified_object_name_new, but when does not match alternative 1 fully, then would try to match alternative 2 and so on.
So, for input 'LIB.TAB1' it would finally match against alternative 2 of qualified_object_name_new. However, it is not working this way and gives error while paring 2-part name or unqualified name.

Interestingly, when I set option k = 1, then all 3 inputs are parsed correctly by the new rule. But with any other value of k, it gives error.

I want to understand why Antlr behaves this way and is this correct.

Most likely a bug. If you can, switch to ANTLR4, since development (and community support) for v3 is limited. — Bart Kiers, Mar 13 '18 at 13:29
@BartKiers Thanks. Works fine with Antlr4. However I cannot switch to Antlr4 as of now, so will try to use a workaround for now. — asthac, Mar 14 '18 at 11:45

score 0 · Answer 1 · answered Mar 15 '18 at 08:43

0

You probably have not increased the lookahead size (which is 1 by default in ANTLR3). With one token lookahead the new object name rule cannot resolve the ambiquity (all alts start with the same token). You should have gotten a warning about this too.

You have 3 options to solve this problem with ANTLR3 (though I also recommend to switch to version 4):

Enable backtracking (see the backgtrack option), though I'm not 100% sure if that really helps.
Increase lookahead size (see the k option).
Use syntactic predicates to force ANTLR to look ahead the entire alt.

For more details read the ANTLR3 documentation.

answered Mar 15 '18 at 08:43

Mike Lischke

48,925
16
119
181

v3 supports variable lookahead / LL(\*) ([ref](https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687279/What+is+the+difference+between+ANTLR+v2+and+v3)) and it should be default afaik. If this is indeed a lookahead problem, I suggest removing the `k` grammar option altogether and using LL(\*). – Jiri Tousek Mar 15 '18 at 09:53
Weird, I don't see where this LL(*) algorithm is actually applied. When you look at the generated code you can see it's full of huge case statements to do the look ahead (one case block for each k). – Mike Lischke Mar 16 '18 at 08:27
Generated code seems to use case statements whenever it can avoid LL(\*), using sort of per-decision-determined-lookahead. However in case where it truly needs a LL(\*) it produces a `org.antlr.runtime.DFA` subclass and instance. – Jiri Tousek Mar 16 '18 at 19:41
OK, thanks @JiriTousek. Would be good when the OP could tell us if any of my suggestions worked. – Mike Lischke Mar 17 '18 at 09:13

Antlr3 behaves differently in 2 different rules when the rule's intent is effectively same

1 Answers1