0

please consider the following grammar which gives me unexpected behavior:

lexer grammar TLexer;
WS                      : [ \t]+  -> channel(HIDDEN) ;
NEWLINE                 : '\n' -> channel(HIDDEN) ;
ASTERISK                : '*' ;
SIMPLE_IDENTIFIER       : [a-zA-Z_] [a-zA-Z0-9_$]* ;
NUMBER                  : [0-9] [0-9_]* ;

and

parser grammar TParser;
options { tokenVocab=TLexer; }
seq_input_list :
   level_input_list | edge_input_list ;

level_input_list :
  ( level_symbol_any )+ ;

edge_input_list :
  ( level_symbol_any )*  edge_symbol ;

level_symbol_any :
  {getCurrentToken().getText().matches("[0a]")}? ( NUMBER | SIMPLE_IDENTIFIER ) ;

edge_symbol :
  SIMPLE_IDENTIFIER | ASTERISK ;

The input 0 * is parsed fine but 0 f is not recognized by the parser (no viable alternative at input 'f'). If I change the ordering of rules in seq_input_list, both inputs are recognized.

My question to you is, if this indeed is an ANTLR issue or I understand the usage of semantic predicates wrong. I would expect the input 0 f to be recognized as (seq_input_list (edge_input_list (level_symbol_any ( NUMBER) edge_symbol ( SIMPLE_IDENTIFIER ) ) ).

Thank you in advance!

Julian

J. Nagel
  • 1
  • 2
  • Print the tokens in your token stream and look what the lexer recognized. Is that what you (and your parser) expected? – Mike Lischke Jun 16 '18 at 16:53
  • For `0 *` I get [NUMBER('0'), WS(' '), ASTERISK('*'), EOF] and for `0 f` [NUMBER('0'), WS(' '), SIMPLE_IDENTIFIER('f'), EOF]. This is what I expect and what I would expect the parser to recognize as valid input. – J. Nagel Jun 18 '18 at 09:45
  • Yes, in fact the grammar works fine. I just tried it out. The error must come from another problem (not regenerated parser after a change or similar). – Mike Lischke Jun 19 '18 at 08:17
  • I am sorry, I cannot see what I could be doing wrong, since I automatically (and now also manually) delete all generated files and regenerate the parser afterwards. Would you please specify how the grammar worked for you? What are the trees for `0 f`, `0 *` and `1 *`? Thank you for your help! P.S. Using ANTLR 4.7.1 – J. Nagel Jun 19 '18 at 12:17

0 Answers0