11

Given the following basic grammar I want to understand how I can handle comment lines. Missing is the handling of the <CR><LF> which usually terminates the comment line - the only exception is a last comment line before the EOF, e. g.:

# comment
abcd := 12 ;
# comment eof without <CR><LF>


grammar CommentLine1a;

//==========================================================
// Options
//==========================================================



//==========================================================
// Lexer Rules
//==========================================================

Int
  : Digit+
  ;

fragment Digit
  : '0'..'9'
  ;

ID_NoDigitStart
  : ( 'a'..'z' | 'A'..'Z' ) ('a'..'z' | 'A'..'Z' | Digit )*
  ;

Whitespace
  : ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN ; }
  ; 


//==========================================================
// Parser Rules
//==========================================================

code
  : ( assignment | comment )+
  ;

assignment
  : id_NoDigitStart ':=' id_DigitStart ';'
  ;

id_NoDigitStart
  : ID_NoDigitStart
  ;  

id_DigitStart
  : ( ID_NoDigitStart | Int )+
  ;

comment
  : '#' ~( '\r' | '\n' )*
  ;
ANTLRStarter
  • 309
  • 1
  • 4
  • 16
  • What do you mean "handle" comment lines? Are you wondering how to parse them? – Jonathan M Aug 15 '11 at 21:00
  • 1
    It seems your are trying to handle the comments in your parser grammar, normally whey would get handled in the lexer, similar to your `Whitespace` rule. Are your sure you want to to this in the parser? – Jörn Horstmann Aug 15 '11 at 21:13
  • What's not working right now, specifically? I found this ANTLR mailing list posting from 2006 with basically the same question (and an answer, but it looks similar to what you already have): http://www.antlr.org/pipermail/antlr-interest/2006-January/015130.html – John Zwinck Aug 16 '11 at 00:38
  • Many thanks for your answers! @john: A very valuable link, a special thank. – ANTLRStarter Aug 16 '11 at 08:39

1 Answers1

24

Unless you have a very compelling reason to put the comment inside the parser (which I'd like to hear), you should put it in the lexer:

Comment
  :  '#' ~( '\r' | '\n' )*
  ;

And since you already account for line breaks in your Space rule, there's no problem with input like # comment eof without <CR><LF>

Also, if you use literal tokens inside parser rules, ANTLR automatically creates lexer rules of them behind the scenes. So in your case:

comment
  :  '#' ~( '\r' | '\n' )*
  ;

would match a '#' followed by zero or more tokens other than '\r' and '\n' and not zero or more characters other than '\r' and '\n'.

For future reference:

Inside parser rules

  • ~ negates tokens
  • . matches any token

Inside lexer rules

  • ~ negates characters
  • . matches any character in the range 0x0000 ... 0xFFFF
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks again Bart for that answer. I wanted to use a parser rule for further processing the comment line. Now I tried to use a parser rule `comment` with just the lexer rule `Comment`, which works, too. But is that the correct way? Are there any more or less common rules when to use a lexer ruler or a parser rule in such situations? – ANTLRStarter Aug 16 '11 at 08:36
  • @ANTLRStarter, comments are almost always single tokens. If you "promote" them to parser rules, you'll also need to remove `'\r'` and `'\n'` from `Space` and create a `LineBreak` token: otherwise you won't be able to "see" when a `comment` (as a parser rule!) ends. – Bart Kiers Aug 16 '11 at 08:49