2

Total clang/llvm n00b here. I'm building a multilingual static analysis tool which has parsers for individual languages export an appropriately tagged language-agnostic AST for a single analyser binary to process. So far I've been using native parsers for Ruby and Go and ANTLR-based one for Swift.

ANTLR in particular won my heart because of it's Listener API where 3 generic methods are provided:

void enterEveryRule(ParserRuleContext ctx) 
void exitEveryRule(ParserRuleContext ctx) 
void visitTerminal(TerminalNode node) 

You can also subscribe to enter and exit events for particular rules that you find particularly interesting. So for a given rule you'd have four callbacks triggered - enterEveryRule, enterParticularRule, exitParticularRule, exitEveryRule - in this particular order. There is also concept of TerminalNodes which are things like literals etc.

Since this sort an API what feels most natural for the task at hand I would kindly like to solicit your advice on how to go about replicating it in clang. I went through this SO answer and I got some inspiration but I still don't know how to go about terminal nodes.

Michael
  • 41,989
  • 11
  • 82
  • 128
Marcin Wyszynski
  • 2,188
  • 1
  • 17
  • 16
  • 2
    And your question is? – πάντα ῥεῖ Feb 21 '16 at 15:34
  • How is this different than "walk the tree" and test for the nodes you care about? As a matter of practicality, you really want a lot more capability for pattern matching against complex trees (try writing the code recognize a complex statement) and extracting facts from near and far points in the code. This API gives almost nothing for that. – Ira Baxter Feb 21 '16 at 16:34
  • @IraBaxter I'm using a stack to generate the language-agnostic AST (a Protocol Buffers message) so I like how I can take something off the stack in an `onExit` callback. That's what I'm trying to figure out. – Marcin Wyszynski Feb 21 '16 at 16:54
  • @πάνταῥεῖ There are two questions, really - 1) how to provide `onExit` callbacks; 2) what would be the clang's equivalent of ANTLR's terminal nodes; – Marcin Wyszynski Feb 21 '16 at 16:55
  • If you do a treewalk, you can still manage your own stack and push and pop as you like, so I still don't see the point. (I don't believe in the "language agnostic AST";; too hard to model the precise semantics, e.g., What will you do with "add" to capture the differences, so good luck with that). – Ira Baxter Feb 21 '16 at 18:24
  • @IraBaxter It's no longer an AST, really but something that the analyser consumes to calculate a number of metrics. So no need to model the precise semantics. What I don't understand is how the AST walk tells me that we've already processed an entire subtree (which is what would trigger the `onExit` callback). For example this is accomplished in Go API by a call with nil argument: https://golang.org/pkg/go/ast/#Visitor – Marcin Wyszynski Feb 21 '16 at 18:53

0 Answers0