The antlr book has the following sample code to resolve grammar ambiguities using semantic predicates:
// predicates/PredCppStat.g4
@parser::members {
Set<String> types = new HashSet<String>() {{add("T");}};
boolean istype() { return types.contains(getCurrentToken().getText());}
}
stat: decl ';' {System.out.println("decl "+$decl.text);}
| expr ';' {System.out.println("expr "+$expr.text);}
;
decl: ID ID
| {istype()}? ID '(' ID ')'
;
expr: INT
| ID
| {!istype()}? ID '(' expr ')'
;
ID : [a-zA-Z]+ ;
INT : [0-9]+ ;
WS : [ \t\n\r]+ -> skip ;
Here, the predicate is the first function called in a rule, determining whether the rule should be fired or not. And it uses getCurrentToken() to take its decision.
However, if we alter the grammar slightly, to use hierarchical names instead of simple ID, like this:
decl: ID ID
| {istype()}? hier_id '(' ID ')'
;
expr: INT
| ID
| {!istype()}? hier_id '(' expr ')'
;
hier_id : ID ('.' ID)* ;
Then the istype() predicate can no longer use getCurrentToken to take its decision. It will need the entire chain of tokens in the hier_id to determine whether the chain is a type symbol or not.
That means, that we will need to do one of the following:
(1) put the predicate after hier_id, and access these value from istype(). Is this possible? I tried it, and I am getting compiler errors on the generated code.
(2) break up the grammar into sub-rules, and then place istype() after hier_id tokens are consumed. But this will wreck the readability of the grammar, and I would not like to do it.
What is the best way to solve this problem? I am using antlr-4.6.