I have what I thought a very simple grammar to write:
- I want it to allow token called fact. These token can start with a letter and then allow a any kind of these: letter, digit, % or _
- I want to concat two facts with a . but the the second fact does not have to start by a letter (a digit, % or _ are also valid from the second token)
- Any "subfact" (even the initial one) in the whole fact can be "instantiated" like an array (you will get it by reading my examples)
For example:
- Foo
- Foo%
- Foo_12%
- Foo.Bar
- Foo.%Bar
- Foo.4_Bar
- Foo[42]
- Foo['instance'].Bar
- etc
I tried to write such grammar but I can't get it working:
grammar Common;
/*
* Parser Rules
*/
fact: INITIALFACT instance? ('.' SUBFACT instance?)*;
instance: '[' (LITERAL | NUMERIC) (',' (LITERAL | NUMERIC))* ']';
/*
* Lexer Rules
*/
INITIALFACT: [a-zA-Z][a-zA-Z0-9%_]*;
SUBFACT: [a-zA-Z%_]+;
ASSIGN: ':=';
LITERAL: ('\'' .*? '\'') | ('"' .*? '"');
NUMERIC: ([1-9][0-9]*)?[0-9]('.'[0-9]+)?;
WS: [ \t\r\n]+ -> skip;
For example, if I tried to parse Foo.Bar
, I get: Syntax error line 1 position 4: mismatched input 'Bar' expecting SUBFACT
.
I think this is because ANTLR first finds Bar
match INITIALFACT and stops here. How can I fix this ?
If it is relevent, I am using Antlr4cs.