How to tokenize Java8 program using antlr

Question

Currently I am using the Java8.g4 of java 8 from this repo: https://github.com/antlr/grammars-v4

However, I was wondering how can I modify the Java8.g4 file to make sure if I encounter multiple new lines I only tokenize one of them?

Refer to: Parsing Newlines, EOF as End-of-Statement Marker with ANTLR3, I can add new line to the parse tree (by adding NEWLINE: ('\r\n'|'\n'|'\r') to the .g4 file. However, if I have multiple new lines, multiple lines will be parsed and added to the tree which is not what I want.

Hope someone can help me out!

Thanks

You can't just remove the EOS (semi colon, I presume) from the grammar: it would become one big ambiguous mess. — Bart Kiers, Aug 18 '17 at 08:12
In java8.g4 in the repo I posted above, it doesn't add "\n" to its token list. What I am trying to do is to add it to the list and possible replace "\n" by EOF. I am not sure if it is possible. — teddy, Aug 18 '17 at 08:22
Okay, so you're not trying to remove EOS (not EOF?) but replacing it instead. Try it and see for yourself. If you get stuck, feel free to ask a specific question on SO. Good luck! — Bart Kiers, Aug 18 '17 at 08:34

score 1 · Accepted Answer · answered Aug 19 '17 at 09:20

1

I guess you mean the whitespaces are not kept in the token list produced by the lexer, right? This happens when whitespaces are skipped in the grammar. Check it for e.g.

WS: [ \t] -> skip;

and change that to

WS: [ \t] -> channel(HIDDEN);

This way the whitespaces are kept on the hidden channel and you can read them via the CommonTokenStream instance, but do not get in the way (just like with skip).

answered Aug 19 '17 at 09:20

Mike Lischke

48,925
16
119
181

Sorry, previously I am not quite clear about what I want to ask. Now I have modified the question, hope you can help me out! – teddy Aug 21 '17 at 12:10

How to tokenize Java8 program using antlr

1 Answers1