27

I have seen many ANTLR grammars that use whitespace handling like this:

WS: [ \n\t\r]+ -> skip;
// or
WS: [ \n\t\r]+ -> channel(HIDDEN);

So the whitespaces are thrown away respectively send to the hidden channel.

With a grammar like this:

grammar Not;

start:      expression;
expression: NOT expression
          | (TRUE | FALSE);

NOT:    'not';
TRUE:   'true';
FALSE:  'false';
WS: [ \n\t\r]+ -> skip;

valid inputs are 'not true' or 'not false' but also 'nottrue' which is not a desired result. Changing the grammar to:

grammar Not;

start:      expression;

expression: NOT WS+ expression
          | (TRUE | FALSE);

NOT:    'not';

TRUE:   'true';
FALSE:  'false';

WS: [ \n\t\r];

fixes the problem, but i do not want to handle the whitespaces manually in each rule.

Generally i want to have a whitespace between each token with some exceptions (e.g. '!true' does not need a whitespace in between).

Is there a simple way of doing this?

flux
  • 275
  • 1
  • 3
  • 5

2 Answers2

26

Add an IDENTIFIER lexer rule to handle words which are not keywords.

IDENTIFIER : [a-zA-Z]+;

Now the text nottrue is a single IDENTIFIER token which your parser would not accept in place of the distinct keywords in not true.

Make sure IDENTIFIER is defined after your other keywords. The lexer will find that both NOT and IDENTIFIER match the text not, and will assign the token type to the first one that appears in the grammar.

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
  • 1
    Thanks. That works as desired for '*nottrue*' (invalid) and '*!true*' (valid). Do you also have an idea how i can make an exception of this rule, so that it is possible for some other inputs to omit the whitespace? Like '*A B true*', where the whitespace between A and B is optional. So this is also valid: '*AB true*', but '*ABtrue*' not. – flux Mar 19 '13 at 16:01
-1

If you want to control how whitespace is handled then the most strightforward way is to give instructions to antlr how to handle whitespace. Eg. WS+ Why should antlr be able to automatically guess how you want the whitespace to be handled without specifying it explicitly?