2

I have written an LR(1) parser that can successfully parse strings in the language of my grammar into a Concrete Syntax Tree, but I am now trying to construct an Abstract Syntax Tree.

I am using an inheritance design for my AST nodes:

struct ASTNode {
    virtual Type typeCheck() = 0;
}

struct IDNode : public ASTNode {
    string name;
    ...
}

struct INTNode : public ASTNode {
    int value;
    ...
}

struct BOPNode : public ASTNode {
    ASTNode *pLeft;
    ASTNode *pRight;
    ...
}

struct Add_BOPNode : public BOPNode {
    ...
}

struct ParamNode : public ASTNode {
    string name;
    ASTNode *pTypeSpecifier;
    ...
}

struct ParamListNode : public ASTNode {
    vector<ParamNode*> params;
    ...
}

struct FuncDec : public ASTNode {
    string functionName;
    ASTNode *pFunctionBody;
    ASTNode *pReturnType;
    ASTNode *pParams;
    ...
}

When I perform a reduction in my LR(1) parser I generate a new node depending on the rule that was used for the reduction. This is pretty straightforward for most of the nodes, but I'm not sure of a clean way to implement a node that contains a list of other nodes.

Using the ParamListNode from above as an example:

struct stack_item {
    int state;
    int token;
    string data;
    ASTNode *node;
};

/// rule = the number of the rule being reduced on
/// rhs = the items on the right-hand side of the rule

ASTNode* makeNode(int rule, vector<stack_item> rhs) {
    switch(rule) {
        /// <expr> ::= <expr> '+' <term>
        case 1: return new Add_BOPNode(rhs[0].node, rhs[2].node);

        /// <param> ::= IDENT(data) ':' <type>
        case 2: return new ParamNode(rhs[0].data, rhs[2].node);

        /// <param_list> ::= <param>
        case 3: return new ParamList(rhs[0].node);

        /// <param_list> ::= <param_list> ',' <param>
        case 4: {
            auto list = dynamic_cast<ParamListNode*>(rhs[0].node);
            list->params.push_back(rhs[2].node);
            return list;
        }

        ...
    }
}

Since generating a node requires a subclass of ASTNode to be returned, I have to create a subclass that encloses a vector<> with each sub-node. However, since not every node needs to be a list structure, I have to dynamic_cast<> to the subclass before I can access the internal list.

I feel like there should be a cleaner way to handle a list of sub-nodes without having to rely on dynamic_cast<>.

Another question is about the FuncDec node. It has pParams which should be a ParamList (or vector<Param*> directly), but to do that I would have to dynamic_cast<> the incoming ASTNode to a ParamList or Param node. Again, I feel like there should be a way to not use dynamic_cast<>, but I can’t think of one.

Also, if you have any other suggestions about how I can better structure or implement anything that would be greatly appreciated :)

Sarathi
  • 1,023
  • 1
  • 14
  • 22
  • What does `stack_item` look like? – Simon Kraemer Feb 19 '16 at 16:30
  • @SimonKraemer I've added the stack_item struct definition to the top of the second code block. – Sarathi Feb 19 '16 at 17:29
  • Hmm. Maybe I am getting something wrong but what for do you need the `stack_item`s for? Wouldn't it be easier to construct the nodes directly using the concrete types? – Simon Kraemer Feb 19 '16 at 17:56
  • @SimonKraemer The stack_item is the structure that goes on my parse stack during the LR parsing. I needed to keep track of what part of the AST belonged to each item on the parse stack so I could link them together during a reduction. That's why I added the `node` property to the stack_item. Also, some symbols on the RHS of a rule are terminals with data associated with them like IDENT(data), so I also needed to be able to pass that data to ASTNode's that needed it. – Sarathi Feb 19 '16 at 18:07
  • You could create a union node that has either traditional children or a list of children, and simply access what you want. You lose some type safety because you might mistakenly access the list when you should have accessed a child, but then any coding error you mistake is a problem, so don't do that :-} – Ira Baxter Aug 11 '18 at 14:44

1 Answers1

1

My LRSTAR Parser Generator creates an abstract-syntax tree (AST) by using only one class, Node. Each node is the same structure, a pointer to the token (in the symbol table if a leaf node), and pointers to parent, child and next nodes. The next pointer allows you to have a list of nodes (multiple children for a parent node). This has worked well for many years.

During processing of the AST, it is the function associated with the node which takes care of the processing of the node. For example, the add function will do something different than the subtract function. The functions are different, instead of having a different class for each type of node.

Here is the node structure that I use:

  class Node
  {
     public:
     int    id;      // Node id number              
     int    prod;    // Production (rule) number           
     int    sti;     // Symbol-table index (perm or temp var).
     int    prev;    // Previous node.                          
     int    next;    // Next node.                          
     int    line;    // Line number.                      
     int    child;   // Child node.                       
     int    parent;  // Parent node.
  };