1

Suppose a line has a maximum length of 5. I want an Identifier to continue when a newline character is put on position 5.

examples:

  • abcd'\n'ef would result in a single Identifier "abdef"
  • ab'\n'def would result in Identifier "ab" (and another one "def")

Somehow I cannot get it working...

Attempt 1 is something like:

NEWLINE1  : '\r'? '\n' { _tokenStartCharPositionInLine == 5 } -> skip;
NEWLINE2  : '\r'? '\n' { _tokenStartCharPositionInLine < 5 } -> channel(WHITESPACE);

Identifier    : Letter (LetterOrDigit)*;

fragment
Letter        : [a-zA-Z];

fragment
LetterOrDigit : [a-zA-Z0-9];

Attempt 2 is something like:

WS  :   (' ' | '\t' | '\n' | '\r' | '\f')+ -> channel(WHITESPACE);

Identifier    : Letter (LetterOrDigit NEWLINE?)*;

NEWLINE:   '\r'? '\n' { _tokenStartCharPositionInLine == 5}? -> skip;

fragment
Letter        : [a-zA-Z];

fragment
LetterOrDigit : [a-zA-Z0-9];

This seems to work, however the '\n' sign is still part of the Identifier when processing it in the parser. Somehow I do not succeed into 'ignoring' the newline when it is on the last position of a line.

1 Answers1

0

This seems to work, however the '\n' sign is still part of the Identifier when processing it in the parser.

That is because the NEWLINE is only skipped when matched "independently". Whenever it is part of another rule, like Identifier, it will stay part of said rule.

IMO, you should just go for this solution and not add too much predicates to your lexer (or parser). Simply strip the line break from the Identifier after parsing.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288