Token collision (??) writing ANTLR4 grammar

Question

I have what I thought a very simple grammar to write:

I want it to allow token called fact. These token can start with a letter and then allow a any kind of these: letter, digit, % or _
I want to concat two facts with a . but the the second fact does not have to start by a letter (a digit, % or _ are also valid from the second token)
Any "subfact" (even the initial one) in the whole fact can be "instantiated" like an array (you will get it by reading my examples)

For example:

Foo
Foo%
Foo_12%
Foo.Bar
Foo.%Bar
Foo.4_Bar
Foo[42]
Foo['instance'].Bar
etc

I tried to write such grammar but I can't get it working:

grammar Common;

/*
 * Parser Rules
 */
fact: INITIALFACT instance? ('.' SUBFACT instance?)*;
instance: '[' (LITERAL | NUMERIC) (',' (LITERAL | NUMERIC))* ']';

/*
 * Lexer Rules
 */
INITIALFACT: [a-zA-Z][a-zA-Z0-9%_]*;
SUBFACT: [a-zA-Z%_]+;
ASSIGN: ':=';
LITERAL: ('\'' .*? '\'') | ('"' .*? '"');
NUMERIC: ([1-9][0-9]*)?[0-9]('.'[0-9]+)?;

WS: [ \t\r\n]+ -> skip;

For example, if I tried to parse Foo.Bar, I get: Syntax error line 1 position 4: mismatched input 'Bar' expecting SUBFACT.

I think this is because ANTLR first finds Bar match INITIALFACT and stops here. How can I fix this ?

If it is relevent, I am using Antlr4cs.

Your conclusion [is correct](http://stackoverflow.com/documentation/antlr/3271/lexer-rules-in-v4/11235/priority-rules#t=201702231150341687047). Just use a single `FACT` lexer rule everywhere, and check for correct naming in a post-processing step. — Lucas Trzesniewski, Feb 23 '17 at 11:52
@MikeLischke No. Whatever subfact name I tried, it failed with the same error. — fharreau, Feb 23 '17 at 13:06
@LucasTrzesniewski Doing what you said fixed my issue but force me to move the Numeric lexer rule in first place. Otherwise, in `Fact[1]`, `1` will match with `FACT` and not `NUMERIC`. It is a pity that we can not handle this differently. Thanks for the links by the way ! — fharreau, Feb 23 '17 at 13:17

Token collision (??) writing ANTLR4 grammar

0 Answers0