1

How to implement this rule in ANTLR4:

multiline-comment-text-item -> Any Unicode scalar value except /* or */

?

jeudesprit
  • 25
  • 5

2 Answers2

0

In ANTLR, you cannot say: "match this-or-that character, except these multiple (!) characters". You can only say "match this-or-that character, except these single (!) characters":

ANY_EXCEPT_STAR : ~[*];
ANY_EXCEPT_FSLASH : ~[/];

But doing FOO : ~[/*]; matches any single character except a / and *.

I wouldn't match multiline-comment-text-item in a lexer rule of its own, but rather inside the multiline-comment-text where it's (most likely) used:

MultilineCommentText
 : '/*' .*? '*/'
 ;

Be sure to include the ? in there, making it ungreedy.

Note that quite often, such tokens are hidden or discarded so that they won't end up in parser rules. In that case do either:

MultilineCommentText
 : '/*' .*? '*/' -> skip
 ;

or

MultilineCommentText
 : '/*' .*? '*/' -> channel(HIDDEN)
 ;

See: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
0

I've just met this rule when trying to parse Swift with ANTLR4. Following is my implementation:

MULTILINE_COMMENT
    : '/*' ('/'*? MULTILINE_COMMENT | ('/'* | '*'*) ~[/*])*? '*'*? '*/'
;

It's unnecessary to split multiline-comment into that many subrules as in the document.

H3NT41
  • 81
  • 1
  • 4