Semantic Checks after the AST creation

Question

I created a scanner and a parser (with flex and bison respectively) and an AST to implement a Java-Python translator. I don't understand how to manage semantic actions in AST (type checking, variable declaration checking,...), that is where to insert functions that implement these checks and how to connect Symbol table (that I created) to the AST. Considering, for example, this production in the parser:

VariableDeclaration
                   : VariableName                               {$$ = varDec_new($1,NULL);}
                   | VariableName ASSIGNOP ExpressionStatement  {$$ = varDec_new($1,$3);}
                ;

With varDec_new defined as follow in ast.c :

ast_node *varDec_new(ast_node *variableName, ast_node *exprStmt)
{
    ast_node *n = newast(AST_VARDEC); // ast_node allocation (in this case for the ast_node AST_VARDEC (type of ast_node)
    n->varDec.variableName = variableName; // pointer to variableName struct in AST
    n->varDec.exprStmt = expreStmt;   //pointer to expreStmt struct in AST
    return n;
}

How can I manage type checking (between VariableName and ExpressionStatement)? Have I to create a function with the entire AST like parameter (in ast.c) or have I to call this function whenever I find a production that requires type checking in the parser ?

Which library do you use apart from flex/bison? are you doing everything from scratch? how do you validate your code does the right thing? what does "a Java-Python translator" mean, actually? do you want to run Java code using a python runtime? if so, did you consider writing a Java bytecode interpreter in Python? I have so many questions about this. Regarding your question, you first build an AST, then you do several passes on it, do not do everything at once. You need a context (an environment) where you put information about compilation units, and then you walk the tree with typing rules. — coredump, Sep 27 '18 at 11:36
Yes, you need to do a post-order traverse of the AST, typechecking at each node (probably recursively). [Wikipedia](https://en.wikipedia.org/wiki/Tree_traversal) might get you started. My advice is to do the entire traverse on the complete AST *after the parse is finished* rather than trying to do it piecemeal in every production. — rici, Sep 27 '18 at 16:13
Thank you @rici , so I should create a function to traverse the whole tree in post-order? And if this function is to be created, where will I go to recall all the functions for the semantic checks? It is precisely the connection between ast and semantic checks that I can not understand. — Mick, Sep 27 '18 at 20:26
@Mick: a semantic check is a function on an AST node. For example, if the node represents an addition, then the two child operands need to be of arithmetic type and the result type will be the composite of those two types. Or if the node represents a function call, then the child representing the function needs to be a function, and the children representing arguments must be the correct size and type for the prototype of the function. Etc. — rici, Sep 27 '18 at 21:01
@rici But do semantic checks be done in the same function as creating the node? Following your example on the addition node, when I create the node structure with the respective function, where do I call the function for semantic type control? Within the same function used to create the node? — Mick, Sep 27 '18 at 21:34
@Mick: I think I have been quite clear that my advice is that you do the semantic checks at the end during a walk of the AST, *not* when you create the AST node. What did I say which lead you to think that I might not have meant that? — rici, Sep 27 '18 at 21:50
Sorry @rici , I misunderstood, but therefore I have to do a post- order traversing of a not binary tree ( using a recursive function) and for each recursion I have to verify with a case construct on which type of node I am , and to execute semantic checks (and translation in Python) related to that node, right? — Mick, Sep 28 '18 at 08:38
@mick: that's basically the suggestion. The AST might or might not be binary; usually they are mostly binary, but that depends on you, really. Case constructs are one possible way of implementing a walk but there are others depending on the language you are writing in. You don't have to do everything in a single walk. It's often more convenient to do multiple passes. Good luck. — rici, Sep 28 '18 at 13:38
Hi @rici , I created the function to traverse the AST, but now I have some problem with the symbol table (I have to traverse the tree first to populate Symbol table and then traverse it later to implement semantic checks) . I need to know what kind of symbol table implementation I sould use that adapts to AST structure and that permit me to manage scopes (implementation language: C) — Mick, Oct 15 '18 at 10:28
@Mick: For C, scoping is relatively simple; a scope more or less corresponds to a block, which is probably a node in the AST. The big exception is that the scope only starts with the declaration, which is somewhere in the middle of the block. If you don't allow redeclaration of variables in a block, the implementation might be slightly simpler. Anyway, that's the basic approach I'd take: use a symbol table for each block and link each table with its outer scope (that is, the table for the innermost surrounding block with symbols). — rici, Oct 15 '18 at 16:16

user2346536 · Answer 1 · 2018-10-05T15:10:46.673

Have a look here for your symbol table to be available in your semantic actions:

Then here is a very simplified pseudo-cpp-code of an assignment semantic action function, though it is better to do it once the tree is complete:

bool storeNodeType(symtable* sym, node* assignment)
{
    switch(root->RHS_node_kind):
    {
        case '+':
             left_type = getNodeType(sym, assignment->RHS->left);
             right_type = getNodeType(sym, assignment->RHS->right);
             [ ... apply_type_coercion_rules ... ]
             return sym->store(found_type,assignment->left->var);

        break;
        [... Plenty of cases]
        case t_Var:
             return symtable->store(assignment->left->var->vartype,assignment->left->var);
        break;
    }
    // each return returns if assignement vaild regarding types */;
}

It makes the assumption that any variable used in the right hand side already has a type, if this is not the case, you'll have to type the tree in a second pass once bison is done.
You basically perform a post order traversal of the RHS, you deduce the type of every sub-expression base on your type coercion rules for each operator. Leaf nodes will be either variables of known type, or constants. So you type the RHS "bottom-up".
In sym::store you have to manage whether you accept trans-typing (string <- int) and/or manage type coercion again (int <- float) for instance. Return errors accordingly.

Semantic Checks after the AST creation

1 Answers1