1

I need to create a grammar for a language with forward references. I think that the easiest way to achieve this is to make several passes on the generated AST, but I need a way to store symbol information in the tree.

Right now my parser correctly generates an AST and computes scopes of the variables and function definitions. The problem is, I don't know how to save the scope information into the tree.

Fragment of my grammar:

composite_instruction
scope JScope;
@init {
    $JScope::symbols = new ArrayList();
    $JScope::name = "level "+ $JScope.size();
}
@after {
    System.out.println("code block scope " +$JScope::name + " = " + $JScope::symbols);
}
    : '{' instruction* '}' -> ^(INSTRUCTION_LIST instruction*)
    ;

I would like to put a reference to current scope into a tree, something like:

    : '{' instruction* '}' -> ^(INSTRUCTION_LIST instruction* {$JScope::symbols})

Is it even possible? Is there any other way to store current scopes in a generated tree? I can generate the scope info in a tree grammar, but it won't change anything, because I still have to store it somewhere for the second pass on the tree.

bialpio
  • 1,004
  • 7
  • 17
  • Does [this Q&A](http://stackoverflow.com/questions/4075510/how-to-implement-a-function-call-with-antlr-so-that-it-can-be-called-even-before) help? – Bart Kiers Nov 18 '10 at 00:14

1 Answers1

2

To my knowledge, the syntax for the rewrite rules doesn't allows for directly assigning values as your tentative snippet suggests. This is in part due to the fact that the parser wouldn't really know to what part of the tree/node the values should be added to.

However, one of the cool features of ANTLR-produced ASTs is that the parser makes no assumptions about the type of the Nodes. One just needs to implement a TreeAdapator which serves as a factory for new nodes and as a navigator of the tree structure. One can therefore stuff whatever info may be needed in the nodes, as explained below.

ANTLR provides a default tree node implementation, CommonTree, and in most cases (as in the situation at hand) we merely need to

  • subclass CommonTree by adding some custom fields to it
  • subclass the CommonTreeAdaptor to override its create() method, i.e. the way it produces new nodes.

but one could also create a novel type of node altogher, for some odd graph structure or whatnot. For the case at hand, the following should be sufficient (adapt for the specific target language if this isn't java)

import org.antlr.runtime.tree.*;
import org.antlr.runtime.Token;

public class NodeWithScope extends CommonTree {

    /* Just declare the extra fields for the node */
    public ArrayList symbols;
    public string    name;
    public object    whatever_else;

    public NodeWithScope (Token t) {
        super(t);
    }
}

/* TreeAdaptor: we just need to override create method */
class NodeWithScopeAdaptor extends CommonTreeAdaptor {
    public Object create(Token standardPayload) {
        return new NodeWithScope(standardPayload);
    }
}

One then needs to slightly modify the way the parsing process is started, so that ANTLR (or rather the ANTLR-produced parser) knows to use the NodeWithScopeAdaptor rather than CommnTree.
(Step 4.1 below, the rest if rather standard ANTLR test rig)

// ***** Typical ANTLR pipe rig  *****
//  ** 1. input stream 
ANTLRInputStream input = new ANTLRInputStream(my_input_file);
//  ** 2, Lexer 
MyGrammarLexer lexer = new MyGrammarLexer(input);
//  ** 3. token stream produced by lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
//  ** 4. Parser
MyGrammarParser parser = new MyGrammarParser(tokens);

//     4.1  !!! Specify the TreeAdapter
NodeWithScopeAdaptor  adaptor = new NodeWithScopeAdaptor();
parser.setTreeAdaptor(adaptor); // use my adaptor

//  ** 5. Start process by invoking the root rule
    r = parser.MyTopRule();
//  ** 6. AST tree
NodeWithScope  t = (NodeWithScope)r.getTree();
//  ** 7.  etc. parse the tree or do whatever is needed on it.

Finally your grammar would have to be adapted with something akin to what follows
(note that the node [for the current rule] is only available in the @after section. It may however reference any token attribute and other contextual variable from the grammar-level, using the usual $rule.atrribute notation)

composite_instruction
scope JScope;
@init {
    $JScope::symbols = new ArrayList();
    $JScope::name = "level "+ $JScope.size();
}
@after {
      ($composite_instruction.tree).symbols = $JScope::symbols;
      ($composite_instruction.tree).name    = $JScope::name;
      ($composite_instruction.tree).whatever_else
            = new myFancyObject($x.Text, $y.line, whatever, blah);
}
    : '{' instruction* '}' -> ^(INSTRUCTION_LIST instruction*)
    ;
mjv
  • 73,152
  • 14
  • 113
  • 156
  • This is probably what I was looking for, but I decided to build my own tree during parsing and then populate it with scopes and perform other checks. Thanks! – bialpio Dec 18 '10 at 17:48