Decoupling algorithm from data, when the algorithm needs knowledge of derived classes

Question

Sorry for the complicated title, but it's a bit hard to explain in just one sentence.

So I'm writing a simple interpreted language to help with some stuff that I often do. I have a lexer set up, feeding into an abstract syntax tree generator.

The Abstract Syntax Tree spits out Expressions. (Which I'm passing around using unique_ptrs). There's several types of expressions that are derived from this base class, which include:

Numbers
Variables
Function calls / prototypes
Binary operations

etc. Each derived class contains the info it needs for that expression, i.e. variables contain a std::string of their identifier, binary operations contain unique_ptrs to the left and right hand side as well as a char of the operator.

Now this is working perfectly, and expressions are parsed just as they should be.

This is what an AST would look like for 'x=y*6^(z-4)+5'

   +--Assignment (=)--+
   |                  |
Var (x)   +--------BinOp (+)----+
          |                     |
          5     +------------BinOp (*)---+
                |                        |
   +---------BinOp (^)-------+         Var (y)
   |                         |
 Num (6)           +------BinOp (-)-----+
                   |                    |
                 Var (z)              Num (4)

The issue arises when trying to decouple the AST from the interpreter. I want to keep it decoupled in case I want to provide support for compilation in the future, or whatever. Plus the AST is already getting decently complex and I don't want to add to it. I only want the AST to have information about how to take tokens and convert them, in the right order, into an expression tree.

Now, the interpreter should be able to traverse this list of top down expressions, and recursively evaluate each subexpression, adding definitions to memory, evaluating constants, assigning definitions to their functions, etc. But, each evaluation must return a value so that I can recursively traverse the expression tree.

For example, a binary operation expression must recursively evaluate the left hand side and the right hand side, and then perform an addition of the two sides and return that.

Now, the issue is, the AST returns pointers to the base class, Expr – not the derived types. Calling getExpression returns the next expression regardless of it's derived type, which allows me to easily recursively evaluate binary operations and etc. In order for the interpreter to get the information about these expressions (the number value, or identifier for example), I would have to basically dynamically cast each expression and check if it works, and I'd have to do this repeatedly. Another way would be to do something like the Visitor pattern – the Expr calls the interpreter and passes this to it, which allows the interpreter to have multiple definitions for each derived type. But again, the interpreter must return a value!

This is why I can't use the visitor pattern – I have to return values, which would completely couple the AST to the interpreter.

I also can't use a strategy pattern because each strategy returns wildly different things. The interpreter strategy would be too different from the LLVM strategy, for example.

I'm at a complete loss of what to do here. One really gumpy solution would be to literally have an enum of each expression type as a member of the expr base class, and the interpreter could check the type and then make the appropriate typecast. But that's ugly. Really ugly.

What are my options here? Thanks!

score 1 · Accepted Answer · answered May 08 '18 at 23:05

1

The usual answer (as done with most parser generators) is to have both a token type value and associated data (called attributes in discussion of such things). The type value is generally a simple integer and says "number", "string" "binary op" etc. When deciding what production the use you examine only the token types and when you get a match to a production rule you then know what kind of tokens feed into that rule.

If you want to implement this yourself look up parsing algorithms (LALR and GLR are a couple examples), or you could switch to using a parser generator and only have to worry about getting your grammar correct and then proper implementation of the productions and not have to concern yourself with implementing the parsing engine yourself.

answered May 08 '18 at 23:05

SoronelHaetir

14,104
1
12
23

This project is mostly an exercise so I'll stick to parsing it myself. So you're saying to have an enum of the type *and* all the attributes on one class? The unused attributes are just left null, and when I know the type I can just access the appropriate attributes? Seems strange to expose undefined behavior like that. – Thor Correia May 09 '18 at 04:16
Correct about the type enum, only sort of for the token values. If you use a union (for the values as is typical in parser generators) you know which branch to use based on the token types that feed into each production. The typical implementation uses parallel stacks of token types and value instances, items are popped off the stacks when a rule is matched (and the result of the production pushed when the rule completes). – SoronelHaetir May 10 '18 at 17:41
Fascinating! Thanks for introducing me to unions. But it's still a little bit awkward to use a union – some types only hold one value (double, string, etc), some hold 2 (functions hold the name (string), and the vector), and some hold 3 (bin ops). But that's kind of interesting. Maybe I'll give something like this a whirl. – Thor Correia May 13 '18 at 05:13
Never mind! I totally get what you mean now. A struct that contains a tag enum, and a union of pointers to each expression. Brilliant! Thanks :) – Thor Correia May 13 '18 at 05:37

Liarokapis Alexandros · Answer 2 · 2018-05-09T11:40:28.267

Why can't you use the visitor pattern? Any return results simply become local state:

class EvalVisitor
{
    void visit(X x)
    {
         visit(x.y);
         int res1 = res();
         visit(x.z);
         int res2 = res();
         res(res1 + res2);
    }
  ....
};

The above can be abstracted away so that the logic lies in proper eval functions:

class Visitor 
{
public:
    virtual void visit(X) = 0;
    virtual void visit(Y) = 0;
    virtual void visit(Z) = 0;
};

class EvalVisitor : public Visitor
{
public:
    int eval(X); 
    int eval(Y);
    int eval(Z);

    int result;

    virtual void visit(X x) { result = eval(x); } 
    virtual void visit(Y y) { result = eval(y); } 
    virtual void visit(Z z) { result = eval(z); } 
};

int evalExpr(Expr& x)
{
    EvalVisitor v;
    x.accept(v);
    return x.result;
}

Then you can do:

Expr& expr = ...;
int result = evalExpr(expr);

This is an option… but it would be a little bit messy. I'll wait and see if I get other responses, if not I may go with this. Thanks! — Thor Correia, May 09 '18 at 04:18

Decoupling algorithm from data, when the algorithm needs knowledge of derived classes

2 Answers2