0

I have a simple rule like so:

ifClause: 'if' '(' condition ')' '{' (structField)+ '}' ;
condition: .*?;

This works for parsing:

if (abc == def) {
    <something>
}

But errors out on:

if (abc.xyz == def) {
    <something>
}

with the error:

line NN:MM token recognition error at: '.'

Why would it not consume '.' character when matching .*?

I am using Antlr 4.5.3 and Python output.

shikhanshu
  • 1,466
  • 2
  • 16
  • 32

1 Answers1

2

First, the parser rule

condition: .*?;

consumes tokens produced by the lexer, not raw characters.

Second, 'token recognition' errors are produced by the lexer when, as here, a character cannot be matched by a lexer rule (by default, the lexer will skip an unrecognized character, producing the error and no corresponding token for use by the parser, and continue matching the input stream).

To fix, ensure that a '.' will be matched by a lexer rule.

GRosenberg
  • 5,843
  • 2
  • 19
  • 23
  • I changed the "condition" to "COND: .*?", and I now run into different issues about ambiguity, which are unrelated to the question. Thank you! – shikhanshu Oct 13 '16 at 23:19