1

I'm currently building a compiler for a language which has global variable and nested subroutine feature. Previously, I've only ever built a compiler for languages which only has local variable without nested subroutine.

I have a problem on how to reuse symbol table filled during semantic analysis phase in code generation phase. I make the symbol table as a stack of linked list, where each linked list represents identifiers declared in a particular scope. Every time it enters a scope, a new list is created and pushed to the stack and it becomes current scope. Likewise, every time it leaves a scope, the list on top of stack is popped. In the end, after the semantic analysis finishes, I practically have empty symbol table, just like when it starts. However, the code generator needs a completely filled symbol table to correctly generate code. How can this be done without re-doing what has been done during semantic analysis (i.e. entering identifiers to the symbol table)?

LeleDumbo
  • 9,192
  • 4
  • 24
  • 38
  • Have you considered structuring your symbol table as a tree rather than a stack, so at the end there is an entire tree of scopes? – david.pfx Feb 08 '16 at 14:06
  • Do you mean so that it follows the AST hierarchy? I think it's better for the respective scope data to be part of the AST node just like the accepted answer, that why I don't need to traverse twice upon entering/exiting a scope – LeleDumbo Feb 08 '16 at 18:18
  • no, scope follows its own hierarchy. You keep all variables and the scopes they belong to for many reasons, including a symbolic listing and for the debugger. The AST should still point to the symbols that were used in the parse. – david.pfx Feb 09 '16 at 07:06
  • you can add another field called nesting level to the symbol table for nested procedure implementations. – Anil Kumar Mar 04 '16 at 08:54

2 Answers2

5

You have to decide how much context your compiler is going to retain to support optimization and code generation.

You can build a pure-on-the-fly code generator that throws away symbol table information on leaving a scope, if it has generated all the code (or the IR) that it is going to generate for that scope. This can work if you are building a quick and dirty compiler, and it is useful when your computer doesn't have a lot of memory. (On modern PCs, you cannot make the latter argument).

If you don't do any code analysis/optimization/IR or code generation until you reach the end of the parsing process, then you'll have to hang onto the symbol-tables-per-scope information longer. You'll discover in this case that you'll have to hang onto the ASTs, too, or you'll have nothing to generate code from. (On modern PCs, this is not an issue).

To build a compiler with a simple architecture, you probably want to isolate parsing, semantic analysis, and code generation passes anyway. In this case, your parser runs and just builds an AST; don't bother building a symbol table. Pass two walks the tree, and builds symbol tables that correspond to parts of the AST, and keeps that relationship; now you have ASTs and associated symbol tables. Pass 3 can now walk the ASTs and use the symbol information to generate and IR. Pass 4 optimizes the IR; it may still reference symbol table entries decorated with type information and possible storage location assignments. After that, you can do optimizations and final code generation.

The main point of all this is, don't throw the symbol tables away. Save them and associate them with the code structures you need for code generation. You have lots of memory to save them in.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • this is actually the same as accepted answer, but needs further digging in the last two paragraph. as I can only accept one answer, upvote is given. – LeleDumbo Feb 08 '16 at 08:48
  • agreed, but it is also a more elaborate/better answer with reasoning +1 – Mark Nov 14 '16 at 00:32
2

This is going to be a bit abstract - as your question - since I don't know anything concrete about your compiler's internal data structures.

When you pop your scope, instead of deleting it, as I assume you do now, assign the pointer to the scope data to a member of the data that you base code generation on for that scope, so that the code generator can get to it.

  • I was so stupid, indeed I can make the pointer to scope data as part of the corresponding AST node member. It can be thrown away along with the AST later. – LeleDumbo Feb 08 '16 at 06:26