0

I have the following ANTLR (version 3) grammar:

grammar GRM;


options
{
    language = C;
    output = AST;
}


create_statement : CREATE_KEYWORD SPACE_KEYWORD FILE_KEYWORD SPACE_KEYWORD value -> ^(value);

value : NUMBER | STRING;


CREATE_KEYWORD : 'CREATE';

FILE_KEYWORD : 'FILE';

SPACE_KEYWORD : ' ';


NUMBER : DIGIT+;

STRING : (LETTER | DIGIT)+;


fragment DIGIT : '0'..'9';

fragment LETTER : 'a'..'z' | 'A'..'Z';

With this grammar, I am able to successfully parse strings like CREATE FILE dump or CREATE FILE output. However, when I try to parse a string like CREATE FILE file it doesn't work. ANTLR matches the text file (in the string) with lexer rule FILE_KEYWORD which is not the match that I was expecting. I was expecting it to match with lexer rule STRING.

How can I force ANTLR to do this?

  • 1
    Did you post your actual grammar? Only `FILE` would become a `FILE_KEYWORD` and `file` will become a `STRING`. – Bart Kiers Apr 26 '21 at 19:19
  • @BartKiers: thanks! I have updated the grammar posted above. It compiles perfectly and when I run the C program that uses it, it gives the following (funny!) error (when string `CREATE FILE file` is parsed): `GRM(1) : error 4 : Unexpected token, at offset 11 near [Index: 4 (Start: -2023230244-Stop: -2023230241) ='file', type<81> Line: 1 LinePos:11] : unexpected input... expected one of : Actually dude, we didn't seem to be expecting anything here, or at least I could not work out what I was expecting, like so many of us these days!`. –  Apr 27 '21 at 07:43
  • 1
    Given it compiles, means that the ANTLR tool worked. The error comes from the C runtime of ANTLR: can't help you there. – Bart Kiers Apr 27 '21 at 13:44
  • Thanks @BartKiers - is it working with other targets like, e.g., Java? –  Apr 28 '21 at 07:20
  • 1
    yes, Java works fine. – Bart Kiers Apr 28 '21 at 19:20

1 Answers1

0

Your problem is a variant on classic contextual keyword vs identifier issue, it seems.

Either "value" should be a lexer rule, not a parser rule, it's too late otherwise, or you should reorder the rules (or both).

Hence using VALUE = NUMBER | STRING (lexer rule) instead of lower case value (grammar rule) will help. The order of the lexer rules are also important, usually definition of ID ("VALUE" in your code) comes after keyword definitions.

See also : 'IDENTIFIER' rule also consumes keyword in ANTLR Lexer grammar

grammar GMR;


options
{
    language = C;
    output = AST;
}


create_statement : CREATE_KEYWORD SPACE_KEYWORD FILE_KEYWORD SPACE_KEYWORD value -> ^(value);


CREATE_KEYWORD : 'CREATE';

FILE_KEYWORD : 'FILE';

value : (LETTER | DIGIT) + | FILE_KEYWORD | CREATE_KEYWORD  ;

SPACE_KEYWORD : ' ';

this works for me in ANTLRworks for input CREATE FILE file and for input CREATE FILE FILE if needed.

Yann TM
  • 1,942
  • 13
  • 22
  • "Your problem is the classic contextual keyword vs identifier issue." Not unless OP is doing something to make the input case insensitive. Otherwise there's no reason why lowercase "file" should be seen as a keyword. Anyway, i don't see how your suggested change makes any difference (other than producing different token types of course). – sepp2k May 05 '21 at 11:33
  • To clarify: You say that it works for you with your suggested changes, but have you tried it without the changes first? It still works, right? What doesn't work would be an input like `CREATE FILE FILE` and that still won't work with the change you suggested. – sepp2k May 05 '21 at 11:40
  • ok I changed the grammar to support `CREATE FILE FILE`, perhaps maybe I'm misunderstanding the problem. – Yann TM May 05 '21 at 11:55
  • 1
    I have no idea what OP's problem is. The grammar that OP posted works on the input that OP gave (as already pointed out by Bart Kiers). Maybe OP actually used the input `CREATE FILE FILE` and posted the wrong input here. Or maybe OP is preprocessing the input stream to make it case insensitive and neglected to mention that. Or maybe it's an unrelated problem, possibly related to the C runtime or OP's C code. If the problem *is* about making `FILE` a contextual keyword, the fix is as simple as changing `value` to `value : NUMBER | STRING | FILE_KEYWORD ;`. – sepp2k May 05 '21 at 12:07
  • 1
    The other changes you made are not necessary to make `FILE` a contextual keyword and are a bad idea. By making `LETTER` a token on its own, you broke the maximum munch rule and something like "FILEX" would now be tokenized as `FILE_KEYWORD`, followed by `STRING`. Also note that your description (which talks about making `value` a lexer rule) and your code (which correctly has `value` as a parser rule) no longer match. – sepp2k May 05 '21 at 12:10
  • ok whatever, just trying to help, if it does not solve the OP's problem, or he is actually making the input case insensitive which is not ANTLR defaults (or even easy to do) I don't know. Yes, I agree overall @sepp2k with your comments, but unless OP can improve the question, I think this is as much effort as is reasonable to put into this question. – Yann TM May 05 '21 at 17:38