Polymorphic Abstract Syntax Tree (recursive descent parser): impossible?

Question

I have begun writing a polymorphic recursive descent parser in C++. However I am running an issue. The classes are set up like this:

class Node {
public:
    std::vector<Node*> children;
};

class NodeBinary : public Node {
public:
    Node* left;
    Node* right;
};

class NodeUnary : public Node {
public:
    Node* operand;
};

class NodeVar : public Node {
public:
    std::string string;
    NodeVar(std::string str) : string(str) {};
};

class NodeNumber : public Node {
public:
    signed long number;
    NodeNumber(signed long n) : number(n) {};
};

// etc.

And then classes like NodeDeclaration, NodeCall, NodeNot, NodeAssignment, NodePlus, NodeMinus, NodeIf etc. will inherit either from Node or something less generic like NodeBinary or NodeUnary.

However, some of them take more specific operands. NodeAssignment always takes a var and a number/expression. So I will have to override Node* left to NodeVar* left and NodeExpr* right. The problem comes in with things like NodePlus. Left can be a NodeVar or a NodeExpr! And the root node has a similar problem: while parsing at the top level to add children nodes to root, how is it possible to tell if a child is a NodeExpr, a NodePlus, a NodeIf etc...?

I could have all Nodes have a enum "type" that says what type it is, but then whats the point of having a nice polymorphic inheritance tree?

How is is this problem normally solved??

score 0 · Accepted Answer · answered Mar 07 '17 at 07:49

0

If you're using class inheritance for your AST nodes, you need to create an appropriate inheritance hierarchy, as with any object-oriented design.

So, for example, NodeAssignment (which is presumably a specialization of NodeStatement) needs to contain a NodeLValue (of which a NodeVariable is a specialization) and a NodeValue. As usual, LValues (i.e. things you can assign to) are a subset of Values, so NodeLValue will be a specialization of NodeValue. And so on. Your binary operator node will contain left and right members, both of which are NodeValue base objects (I would expect NodeValue to be pure virtual, with a large number of specific specializations.)

If you insist on using a recursive descent parser, each parsing function needs to return an appropriate subclass of Node, so that the function which parses the left-hand side of an assignment would logically return a NodeLValue*, ready to insert into the NodeAssignment constructor. (Frankly, I'd ditch the word Node in all of those class names. Put them all into the namespace node:: and save yourself some typing.)

answered Mar 07 '17 at 07:49

rici

234,347
28
237
341

I get what you mean but again part of my problem is how can one tell the difference, even in a NodeValue, (e.g. in a plus where left and right can be values or variables) if it's a number or a variable being used? – Accumulator Mar 07 '17 at 18:06
@accumulator: The essence of object-oriented design is that every object does what it is asked to do. If you need to ask the object what it is in order to do something for it, then you are not properly encapsulating behaviour into objects. – rici Mar 07 '17 at 18:22
Then what is the proper encapsulation for something that can be a var or val? – Accumulator Mar 07 '17 at 18:45
@accumulator: if an object can be a var or a val, then it must be a val because it can do everything a val can do, and in that context you can't ask to do anything else. Which is why I said that var must be derived from val: a var is a val which can also do other things (like be assigned a value). – rici Mar 07 '17 at 19:06
i get it now :) – Accumulator Mar 07 '17 at 19:18

Polymorphic Abstract Syntax Tree (recursive descent parser): impossible?

1 Answers1