Record ownership in symbol tables

Question

I am implementing a symbol table as described in the dragon book:

class SymbolTable {
    std::unordered_map<std::string, Record> table;
    SymbolTable* parent;

public:
    SymbolTable(SymbolTable* p) : parent{p} {}

    const Record* lookUp(const std::string& name) const {
        for (auto* scope = this; scope != nullptr; scope = scope->parent) {
            auto iter = scope->table.find(name);
            if (iter != cend(scope->table))
                return &iter->second;
        }
        return nullptr;
    }

    bool insert(const std::string& name, const Record& record) { 
        return names.insert({name, record}).second; 
    }
};

However, I am not sure how to store the record data. Who should own the type information? Should Record contain a non-owning pointer to the type already stored in the AST?

Also, I would like to keep my symbol table around for later compiler passes. Cooper & Torczon briefly mention directly inserting pointers to the appropriate SymbolTable in the AST node. Is that the common approach?

rici · Accepted Answer · 2020-05-08T15:35:53.410

2

The lookup for names in records usually doesn't follow the bottom-up approach implemented using a parent pointer from scope to scope. (In fact, that simple datastructure may not be entirely applicable to scopes either; as soon as you introduce lexical closures, your scope relationships become more complicated.)

Although there are languages which will do implicit lookup from a structure to the containing structure's members, they're rare and experience shows that this form of name lookup is prone to difficulty, even though it occasionally seems convenient.

The most common pattern is that a structure type contains a list of members, each with its own type. That list of members is, in effect, a symbol table since in order to parse a member reference like r.a.b.c, you need to search for a in r's members, then b in r.a's members, and so on. That suggests that a structure type contain a symbol table of members (which might or might not be a pointer, depending on your design. Typically member lists of a structure are not shared, but in the case of OO subclass/superclass relationships, member lookup can be more complicated.)

I guess the point I'm trying to make here is that the structure of your symbol table depends a lot on the nature of your language. At its core, a symbol table contains a list of symbols organized in a way which makes it efficient to lookup a symbol by its name. The symbol table associates each symbol with some symbol data object, which might vary from symbol table type to symbol table type (for example using C++ generics) or might be consistent across all symbol tables. Often, symbol tables differ from simple hash tables (or associative containers) by the fact that the symbols also have some kind of linear ordering, used to produce a linear representation at compile time. Precise details will vary, but being able to iterate over the symbols in a consistent, well-defined order is often an important feature.

By the general principle of separation of concerns, a symbol table as described above should not also attempt to be a container of symbol tables. The symbol table can answer questions about the names it contains. Searching through multiple symbol tables (scope search, or whatever) is best done with a different object, which knows how to handle name lookup failure in some symbol table but doesn't need to understand the technical details of a single name lookup.

Whether you can keep persistent pointers or references to a symbol table depends entirely on your low-level design. If that's your wish, it's easily accomplished. I think it is pretty common, but I can't speak for the huge variety of language implementations out there.

Symbol tables do not always interrelate in simple ways which can easily be expressed as ownership. In that, they are similar to other internal objects floating around in a compiler. An AST Node might suddenly become a shared node in a graph rather than being a tree node, once you start to implement common-sub-expression optimisations. (And that's just one example.) As far as I know, most compilers of any complexity end up implementing some kind of garbage collection for internal objects, unless of course the compiler is written in a language with general garbage collection.

edited May 08 '20 at 15:35

answered May 08 '20 at 15:04

rici

234,347
28
237
341

Thanks! Note that I used a class `record` here to denote a symbol table entry (as seen in C&T), not a C-style `struct` type. If I understand correctly, in my type hierarchy, `RecordType` must contain the names/types of its data members (right now, it only knows its own name. The children info is in the `RecordDecl` node). Then, when I encounter a `MemberExpr` node (accessing data member), I can look up in the type of the base expression for the existence of the name? Do I really need the types of the children too? Don't the names suffice? – Touloudou May 08 '20 at 15:46
@touloudou: i still have a copy of C&T somewhere but it's too much effort to unearth it :-) So I don't quite get your distinction between record and struct types. Afaics they're both named tuples and the record/struct type needs to hold the name and type of each member. The name is not sufficient... where are you going to look up the type? (Early C put all struct members in the global scope, like enums. What a disaster! That's why you still see painfully prefixed names of struct members in system interfaces. Lesson: namespaces are important.) – rici May 08 '20 at 16:11
I just meant that `record` here meant `The stuff I dump in the symbol table`, not an actual `Record`/`Struct`. :) Ok, that makes complete sense. Thanks! – Touloudou May 08 '20 at 16:44
@Touloudou: You should normally reference complete objects rather than trying to extract partial data. With the complete object, you don't need to reduplicate the already-existent data access methods and you avoid an unnecessary dependency on implementation details of unrelated objects. Concretely, a symbol table associates a Symbol with a Type, possibly in a way which preserves an order. It should not need to look into the internals of Type to do that. And a named tuple type has an associated SymbolTable which associates (member) Symbols with (member) Types. No need to special case. – rici May 08 '20 at 17:39
It's possible that I misunderstood again. But I think it's always good advice. – rici May 08 '20 at 17:40
we're on the same page now. I thought that the symbol table was mapping a name to an object, that would itself contain many things required for semantic analysis and code generation (I haven't reached that part yet...). I thought that the type was just one piece of it. Also, you mentioned using a data structure that preserves the insertion order. Is that required to check for previously declared variables in the same scope? Would you happen to have a reference on the subject? I didn't see it mentioned in the dragon book or C&T. Thanks for all your feedback! – Touloudou May 08 '20 at 18:38
@touloudou: sooner or later you will want to generate a storage layout for the record (or, similarly, for the stack frame). It's not strictly necessary that the storage being allocated in declaration order, but it tends to surprise people when their named tuples are scrambled in memory. (And some languages let you access members of named tuples in other ways than their names.) Keeping the insertion order in the symbol table will facilitate this. But it is not strictly necessary and there are other ways to implement it. I just usually find that it helps. – rici May 08 '20 at 18:47
@touloudou: It's certainly possible that you'll want to put more information in the symbol table than just the type. I should have been clearer about that. For example, it's useful for error messages and debugging information that every symbol be associated with the source location of its definition. Some languages allow you to provide attributes in addition to types. (Other languages consider attributes to be part of the type.) You may be able to attach a specific constant value, not just a type, to a name. Or some indication of a range of values. (not null, always +ve, etc.) – rici May 08 '20 at 18:56
Other information, such as that which results from def/use control flow analysis or similar optimisation techniques, might better be put in a different data structure, since it is not applicable to every symbol type (and some of the data I just mentioned might fall into this category, too). For annotating certain symbols with additional data, it is handy if a symbol table node reference is not only persistent but also hashable, so you can use it as a key in an attribute table. Hope all that helps a bit. – rici May 08 '20 at 18:58
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/213446/discussion-between-touloudou-and-rici). – Touloudou May 08 '20 at 19:30

Record ownership in symbol tables

1 Answers1