How to properly reuse symbol table in a multi-pass compiler

Question

I'm currently building a multi-pass compiler for a block-structured language language. I have built a scope stack in semantic analysis phase. When entering a new scope, create a new table, push it to the stack and make it current scope table, then all symbols inside this scope are inserted into current table. When leaving a scope, the current table is recorded in the AST node, then pop it from the scope stack.

This way, in code generation phase, it does not have to build the symbol table all over again. Once it enters a new scope, it can simplely get the table from AST node and then push it to scope stack. I think this is the way that most of the compiler text books have recommended.

In most cases this works just fine, however, there is a corner case I don't know how to properly deal with. Consider the following code example:

int a = 1;
int b = 2;

void test()
{
    int a = b;
    int b = 3;
}

It has two scopes: the global scope and the test()'s scope. So, to do code generation for test(), we have to:

push global symbol table to the scope stack
get test()'s symbol table from AST node and push it to scope stack

Now, when dealing with "int a = b;", it would find the local vaiable b from the scope stack, which is obviously not correct since local b is not declared yet.

Any idea how to deal with this problem? Do I have to destroy all the symbol when leaving a scope and build the symbol table all over again in code generation phase?

Thanks guys!

Depending on the semantic definition of your source language this may be a moot point. C#, for example, _defines_ `test()` (in this case) to be the scope, so your code above will yield an error on the `b` reference because it hasn't been defined yet. That is, the local definition hides the global one even if it hasn't yet been defined in the code sequence. — 500 - Internal Server Error, Mar 05 '21 at 09:18
Thanks for the hint about how C# would handle this. But I want the semantic to be similar with C. So I expect it to find the global variable b, until the local variable b is defined. Any ideas? — Light XX, Mar 05 '21 at 09:34
Why do you feel the need to construct the scope tree while parsing? Does your grammar require name resolution in order to disambiguate? (As with C/C++). If so, you might want to revisit that language design decision. :-) — rici, Mar 06 '21 at 02:35

score 3 · Accepted Answer · edited Mar 30 '22 at 20:46

3

One solution to this problem it to let the AST node for an identifier contain a link to the specific symbol found in the symbol table at the time of its creation. Under the assumption that the program text is parsed sequentially from beginning to end, one statement at the time, this will give the correct symbol. This also removes the need for doing repeated lookup in the symbol table.

edited Mar 30 '22 at 20:46

JCLL

5,379
5
44
64

answered Mar 08 '21 at 16:49

Johan

3,667
6
20
25

This is known as annotating the AST. Like you said, it can be used to optimize symbol lookups, but it can also be used to store type information during semantic checking. – joeyvanlierop Mar 22 '23 at 20:16

How to properly reuse symbol table in a multi-pass compiler

1 Answers1