3

Often I hear that using a symbol table optimizes look ups of symbols in a programming language. Currently, my language is implemented only as an interpreter, not as a compiler. I do not yet want to allocate the time to build a compiler, so I'm attempting to optimize the interpreter. The language is based on Scheme semantics and syntax for the most part, and is statically-scoped. I use the AST for executing code at run-time (in my interpreter, implemented as discriminated unions just like the AST in Write Yourself a Scheme in 48 Hours.

Unfortunately, symbol look-up in my interpreter is slow due to the use of an F# Map to contain and look up symbols by name. (Well, in truth, it uses a Trie, but the performance is similarly problematic). I would like to instead use a symbol tree to achieve faster symbol lookup. However, I don't know if or how one can implement symbols tables in an interpreter. I hear about them only in the context of a compiler.

Is this possible? If the implementation strategy or performance differs from a symbol table in a compiler, could you describe the differences? Finally, is there an existing reference implementation of a symbol tree in an interpreter I might look at?

Thank you!

Bryan Edds
  • 1,696
  • 12
  • 28
  • A _symbol table_ is anything used to store symbols. It doesn't refer to a particular data structure (AFAIK). – Daniel Jul 17 '12 at 14:43
  • Have you considered using a hash table? I realize you probably want persistance (presumably so each scope has its own table), but you can achieve this with a linked list of "scopes" each having a read-only dictionary of symbols. Symbol resolution would consist of starting with the innermost scope and traversing outwards, checking each symbol table along the way. – Daniel Jul 17 '12 at 14:46
  • Danial - I think the use of a symbol table implies at least the property of constant-time look up. That's my guess from what literature I've read, at any rate. – Bryan Edds Jul 17 '12 at 14:48
  • I think this is a bit like your last question: without knowing why `Map` is too slow and how fast is fast enough only general advice can be given, not precise remedies. A hash table would provide _O(1)_ lookup per scope, so _O(m)_, where _m_ is the number of scopes, overall. I can't imagine it not being fast enough for an interpreter. – Daniel Jul 17 '12 at 15:57
  • One thing that throws me a bit is that Jon Harrop specifically said to use symbol tables, and I presume he had a specific data structure in mind. However, I'm not sure if I made it clear that I need the optimization for an interpreter rather than a compiler. Sorry for these unclear questions - I'm groping around in the dark a bit. I'm hoping Jon pops in here to clarify. – Bryan Edds Jul 17 '12 at 17:28

1 Answers1

9

A symbol table associates some information with every symbol. In an interpreter, you would perhaps associate values with symbols. Map is one implementation particularly suitable for functional interpreters.

If you want to optimize your interpreter, get rid of the need for a symbol table at runtime. One way to to go is De Bruijn idexing.

There is also nice literature on mechanically deriving optimized interpreters, VMs and compilers from a functional interpreter, for example:

http://www.brics.dk/RS/03/14/BRICS-RS-03-14.pdf

For a simple example, consider lambda calculus with constants encoded with De Bruijn indices. Notice that the evaluator gets by without a symbol table, because it can use integers for lookup.

type exp =
    | App of exp * exp
    | Const of int
    | Fn of exp
    | Var of int

type value =
    | Closure of exp * env
    | Number of int

and env = value []

let lookup env i = Array.get env i
let extend value env = Array.append [| value |] env
let empty () : env = Array.empty

let eval exp =
    let rec eval env exp =
        match exp with
        | App (f, x) ->
            match eval env f with
            | Closure (bodyF, envF) ->
                let vx = eval env x
                eval (extend vx envF) bodyF
            | _ -> failwith "?"
        | Const x -> Number x
        | Fn e -> Closure (e, env)
        | Var x -> lookup env x
    eval (empty ()) exp
t0yv0
  • 4,714
  • 19
  • 36
  • Thanks for the link - I'll work through the material! However, I'm having trouble googling a reference for 'DeBruijn encoding' in the context of an interpreter. – Bryan Edds Jul 17 '12 at 15:19
  • @BryanEdds, I think the more precise term is De Bruijn indexing. Added a link to Wikipedia. – t0yv0 Jul 17 '12 at 15:23
  • Thanks for the link. That's rather high-falutin for me, however, as I've no idea how to translate to actual code in my interpreter :) Is there some reference I could use to understand how to connect the theory to practice in my program? – Bryan Edds Jul 17 '12 at 15:44
  • @BryanEdds, keep researching, it takes some time to get the intuition for this but it is worth the while. I added a simple example demonstrating interpreting De Bruijn-indexed code without a symbol table. – t0yv0 Jul 17 '12 at 16:10
  • It seems like this would result in poor error messages (no variable names?). – Daniel Jul 17 '12 at 16:19
  • @Daniel, what does it have to do with optimizing interpreters? You can also carry names around for debugging. – t0yv0 Jul 17 '12 at 16:27
  • @toyvo, I was wondering if array appending in this algorithm would become quite inefficient since it yields n^2 copying? I don't see a way to elide this cost in an interpeter. – Bryan Edds Jul 30 '12 at 12:44
  • @BryanEdds, have you read the paper? My example only demonstrates De Bruijn indexing. It does not demonstrate the most efficient interpreter possible. The paper explains the technique to obtain an abstract machine and then a VM from an interpreter. An optimized VM would use imperative DS such as `Stack` and bytecode for maximum locality. – t0yv0 Jul 30 '12 at 13:11