I devel application analyzing Java source code using ANTLRv4. I claim to match all single-line comments with first token TODO
(e.g. // TODO <some-comment>
) together with directly following statement.
Sample code:
class Simple {
public static void main(String[] args) {
// TODO develop cycle
for (int i = 0; i < 5; i++) {
// unmatched comment
System.out.println("hello");
}
// TODO atomic
int a;
// TODO revision required
{
int b = a+4;
System.out.println(b);
}
}
}
Result = map like this:
"develop cycle" -> for(...){...}
"atomic" -> int a
"revision required" -> {...}
Following official book (1) and similar topics on stackoverflow ((2), (3), (4), (5), (6)) I tried several ways.
At first I hoped for special COMMENTS channel as described in (1) and (2) but error rule 'LINE_COMMENT' contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
occured.
I guess it would be much nicer to parse the source code in a way of ignoring all single-line comments BUT those beginning by TODO
. I hope it is possible to add todo-comments directly into AST in order to use listeners/walkers. Than I'd only need register listener/walker for TODO comment and extract following statement, adding both to desired map.
I've been modifing official Java8 gammar for two days but without any success. Either compiler complains or AST is mismashed.
This is update I made:
// ...
COMMENT
: '/*' .*? '*/' -> skip
;
TODO_COMMENT
: '// TODO' ~[\r\n]*
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
Can anyone help me please? Grammars are not my cup of tea. Thanks in advance
EDIT1:
Grammar modification posted above complies without error, but following tree is generated (please note the red marked nodes including int
)
EDIT2:
Assuming code sample above, while calling parser.compilationUnit();
following error is generated
line 3:2 extraneous input '// TODO develop cycle;' expecting {'abstract', 'assert', 'boolean', 'break', 'byte', 'char', 'class', 'continue', 'do', 'double', 'enum', 'final', 'float', 'for', 'if', 'int', 'interface', 'long', 'new', 'private', 'protected', 'public', 'return', 'short', 'static', 'strictfp', 'super', 'switch', 'synchronized', 'this', 'throw', 'try', 'void', 'while', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, 'null', '(', '{', '}', ';', '<', '!', '~', '++', '--', '+', '-', Identifier, '@'}
line 8:2 extraneous input '// TODO atomic;' expecting {'abstract', 'assert', 'boolean', 'break', 'byte', 'char', 'class', 'continue', 'do', 'double', 'enum', 'final', 'float', 'for', 'if', 'int', 'interface', 'long', 'new', 'private', 'protected', 'public', 'return', 'short', 'static', 'strictfp', 'super', 'switch', 'synchronized', 'this', 'throw', 'try', 'void', 'while', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, 'null', '(', '{', '}', ';', '<', '!', '~', '++', '--', '+', '-', Identifier, '@'}
line 11:2 extraneous input '// TODO revision required;' expecting {'abstract', 'assert', 'boolean', 'break', 'byte', 'char', 'class', 'continue', 'do', 'double', 'enum', 'final', 'float', 'for', 'if', 'int', 'interface', 'long', 'new', 'private', 'protected', 'public', 'return', 'short', 'static', 'strictfp', 'super', 'switch', 'synchronized', 'this', 'throw', 'try', 'void', 'while', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, 'null', '(', '{', '}', ';', '<', '!', '~', '++', '--', '+', '-', Identifier, '@'}
So obviously grammar is incorect as it struggles with simple example