I'm trying to create a documentation generator for several languages. For this I need an AST, in order to known that, for instance, this comment is for a class and this one is for a method of this class.
I started to write this simple Python code which display the tree by recursively looking on it:
import sys
import antlr4
from ECMAScriptLexer import ECMAScriptLexer
from ECMAScriptParser import ECMAScriptParser
def handleTree(tree, lvl=0):
for child in tree.getChildren():
if isinstance(child, antlr4.tree.Tree.TerminalNode):
print(lvl*'│ ' + '└─', child)
else:
handleTree(child, lvl+1)
input = antlr4.FileStream(sys.argv[1])
lexer = ECMAScriptLexer(input)
stream = antlr4.CommonTokenStream(lexer)
parser = ECMAScriptParser(stream)
tree = parser.program()
handleTree(tree)
And tried to parse this Javascript code, with the antlr EcmaScript grammar:
var i = 52; // inline comment
function foo() {
/** The foo documentation */
console.log('hey');
}
This outputs:
│ │ │ │ └─ var
│ │ │ │ │ │ └─ i
│ │ │ │ │ │ │ └─ =
│ │ │ │ │ │ │ │ │ │ └─ 52
│ │ │ │ │ └─ ;
│ │ │ └─ function
│ │ │ └─ foo
│ │ │ └─ (
│ │ │ └─ )
│ │ │ └─ {
│ │ │ │ │ │ │ │ │ │ │ │ └─ console
│ │ │ │ │ │ │ │ │ │ │ └─ .
│ │ │ │ │ │ │ │ │ │ │ │ └─ log
│ │ │ │ │ │ │ │ │ │ │ └─ (
│ │ │ │ │ │ │ │ │ │ │ │ │ │ └─ 'hey'
│ │ │ │ │ │ │ │ │ │ │ └─ )
│ │ │ │ │ │ │ │ │ └─ ;
│ │ │ └─ }
└─ <EOF>
All the comments are ignored, probably because of the presence of channel(HIDDEN)
in the grammar.
After some googling I found this with this answer:
Unless you have a very compelling reason to put the comment inside the parser (which I'd like to hear), you should put it in the lexer.
So, why comments should not be included in the parser and how to get a tree including comments?