I have written an LR(1) parser that can successfully parse strings in the language of my grammar into a Concrete Syntax Tree, but I am now trying to construct an Abstract Syntax Tree.
I am using an inheritance design for my AST nodes:
struct ASTNode {
virtual Type typeCheck() = 0;
}
struct IDNode : public ASTNode {
string name;
...
}
struct INTNode : public ASTNode {
int value;
...
}
struct BOPNode : public ASTNode {
ASTNode *pLeft;
ASTNode *pRight;
...
}
struct Add_BOPNode : public BOPNode {
...
}
struct ParamNode : public ASTNode {
string name;
ASTNode *pTypeSpecifier;
...
}
struct ParamListNode : public ASTNode {
vector<ParamNode*> params;
...
}
struct FuncDec : public ASTNode {
string functionName;
ASTNode *pFunctionBody;
ASTNode *pReturnType;
ASTNode *pParams;
...
}
When I perform a reduction in my LR(1) parser I generate a new node depending on the rule that was used for the reduction. This is pretty straightforward for most of the nodes, but I'm not sure of a clean way to implement a node that contains a list of other nodes.
Using the ParamListNode from above as an example:
struct stack_item {
int state;
int token;
string data;
ASTNode *node;
};
/// rule = the number of the rule being reduced on
/// rhs = the items on the right-hand side of the rule
ASTNode* makeNode(int rule, vector<stack_item> rhs) {
switch(rule) {
/// <expr> ::= <expr> '+' <term>
case 1: return new Add_BOPNode(rhs[0].node, rhs[2].node);
/// <param> ::= IDENT(data) ':' <type>
case 2: return new ParamNode(rhs[0].data, rhs[2].node);
/// <param_list> ::= <param>
case 3: return new ParamList(rhs[0].node);
/// <param_list> ::= <param_list> ',' <param>
case 4: {
auto list = dynamic_cast<ParamListNode*>(rhs[0].node);
list->params.push_back(rhs[2].node);
return list;
}
...
}
}
Since generating a node requires a subclass of ASTNode to be returned, I have to create a subclass that encloses a vector<> with each sub-node. However, since not every node needs to be a list structure, I have to dynamic_cast<> to the subclass before I can access the internal list.
I feel like there should be a cleaner way to handle a list of sub-nodes without having to rely on dynamic_cast<>.
Another question is about the FuncDec node. It has pParams which should be a ParamList (or vector<Param*> directly), but to do that I would have to dynamic_cast<> the incoming ASTNode to a ParamList or Param node. Again, I feel like there should be a way to not use dynamic_cast<>, but I can’t think of one.
Also, if you have any other suggestions about how I can better structure or implement anything that would be greatly appreciated :)