How to implement this rule in ANTLR4:
multiline-comment-text-item -> Any Unicode scalar value except /* or */
?
How to implement this rule in ANTLR4:
multiline-comment-text-item -> Any Unicode scalar value except /* or */
?
In ANTLR, you cannot say: "match this-or-that character, except these multiple (!) characters". You can only say "match this-or-that character, except these single (!) characters":
ANY_EXCEPT_STAR : ~[*];
ANY_EXCEPT_FSLASH : ~[/];
But doing FOO : ~[/*];
matches any single character except a /
and *
.
I wouldn't match multiline-comment-text-item
in a lexer rule of its own, but rather inside the multiline-comment-text
where it's (most likely) used:
MultilineCommentText
: '/*' .*? '*/'
;
Be sure to include the ?
in there, making it ungreedy.
Note that quite often, such tokens are hidden or discarded so that they won't end up in parser rules. In that case do either:
MultilineCommentText
: '/*' .*? '*/' -> skip
;
or
MultilineCommentText
: '/*' .*? '*/' -> channel(HIDDEN)
;
See: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md
I've just met this rule when trying to parse Swift with ANTLR4. Following is my implementation:
MULTILINE_COMMENT
: '/*' ('/'*? MULTILINE_COMMENT | ('/'* | '*'*) ~[/*])*? '*'*? '*/'
;
It's unnecessary to split multiline-comment
into that many subrules as in the document.