0

Currently I am using the Java8.g4 of java 8 from this repo: https://github.com/antlr/grammars-v4

However, I was wondering how can I modify the Java8.g4 file to make sure if I encounter multiple new lines I only tokenize one of them?

Refer to: Parsing Newlines, EOF as End-of-Statement Marker with ANTLR3, I can add new line to the parse tree (by adding NEWLINE: ('\r\n'|'\n'|'\r') to the .g4 file. However, if I have multiple new lines, multiple lines will be parsed and added to the tree which is not what I want.

Hope someone can help me out!

Thanks

teddy
  • 413
  • 3
  • 8
  • 24
  • You can't just remove the EOS (semi colon, I presume) from the grammar: it would become one big ambiguous mess. – Bart Kiers Aug 18 '17 at 08:12
  • In java8.g4 in the repo I posted above, it doesn't add "\n" to its token list. What I am trying to do is to add it to the list and possible replace "\n" by EOF. I am not sure if it is possible. – teddy Aug 18 '17 at 08:22
  • Okay, so you're not trying to remove EOS (not EOF?) but replacing it instead. Try it and see for yourself. If you get stuck, feel free to ask a specific question on SO. Good luck! – Bart Kiers Aug 18 '17 at 08:34

1 Answers1

1

I guess you mean the whitespaces are not kept in the token list produced by the lexer, right? This happens when whitespaces are skipped in the grammar. Check it for e.g.

WS: [ \t] -> skip;

and change that to

WS: [ \t] -> channel(HIDDEN);

This way the whitespaces are kept on the hidden channel and you can read them via the CommonTokenStream instance, but do not get in the way (just like with skip).

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • Sorry, previously I am not quite clear about what I want to ask. Now I have modified the question, hope you can help me out! – teddy Aug 21 '17 at 12:10