0

I have the following ANTLR grammar that forms part of a larger expression parser:

grammar ProblemTest;

atom    :   constant
    |   propertyname;

constant:   (INT+ | BOOL | STRING | DATETIME);

propertyname
    :   IDENTIFIER  ('/' IDENTIFIER)*;

IDENTIFIER 
    :   ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+;

INT 
    :   '0'..'9'+;

BOOL    :   ('true' | 'false');

DATETIME
    :   'datetime\'' '0'..'9'+ '-' '0'..'9'+ '-' + '0'..'9'+ 'T' '0'..'9'+ ':' '0'..'9'+ (':' '0'..'9'+ ('.' '0'..'9'+)*)* '\'';

STRING
    :  '\'' ( ESC_SEQ | ~('\\'|'\'') )* '\''
    ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

If I invoke this in the interpreter from within ANTLR works with

'Hello\\World' 

then this is getting interpreted as a propertyname instead of a constant. The same happens if I compile this in C# and run it in a test harness, so it is not a problem with the dodgy interpreter

I'm sure I am missing something really obvious... but why is this happening? It's clear there is a problem with the string matcher, but I would have thought at the very least that the fact that IDENTIFIER does not match the ' character would mean that this would throw a NoViableAltException instead of just falling through?

beyond-code
  • 1,423
  • 1
  • 12
  • 20
  • Not sure it has to do with it but your int rule is probably wrong. It should be like ('0'..'9')+. Also with the plus operator there you won't need another one in the constant rule. – Mike Lischke Apr 16 '13 at 12:15

1 Answers1

1

First, neither ANTLRWorks nor antlr-3.5-complete.jar can be used to generate code for the C# targets. They might produce files ending in .cs, and those files might even compile, but those files will not be the same as the files produced by the C# port of the ANTLR Tool (Antlr3.exe) or the recommended MSBuild integration. Make sure you are producing your generated parser by one of the tested methods.

Second, INT will never be matched. Since IDENTIFIER appears before INT in the grammar, and all sequences '0'..'9'+ match both IDENTIFIER and INT, the lexer will always take the first option which appears (IDENTIFIER).

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
  • Thanks, I hadn't spotted that. I am indeed using the C# port of the tool to generate my actual parser. Do you have any insights on the problem at hand? – beyond-code Apr 16 '13 at 13:26