1

I'm currently developing a dynamically typed language.

One of the main problems I'm facing during development is how to do fast runtime symbol lookups.

For general, free global and local symbols I simply index them and let each scope (global or local) keep an array of the symbols and quickly look them up using the index. I'm very happy with this approach.

However, for attributes in objects the problem is much harder. I can't use the same indexing scheme on them, because I have no idea which object I'm currently accessing, thus I don't know which index to use!

Here's an example in python which reflects what I want working in my language:

class A:
    def __init__(self):
        self.a = 10
        self.c = 30

class B:
    def __init__(self):
        self.c = 20

def test():
    if random():
        foo = A()
    else:
        foo = B()
    # There could even be an eval here that sets foo
    # to something different or removes attribute c from foo.
    print foo.c

Does anyone know any clever tricks to do the lookup quickly? I know about hash maps and splay trees, so I'm interesting if there is any ways to do it as efficient as my other lookup.

monoceres
  • 4,722
  • 4
  • 38
  • 63
  • Does your language include all the other things which make this hard in general, such as adding and removing attributes of an object during its life time, and `getattr`/`setattr`/`delattr`? –  May 31 '13 at 15:46
  • Yes! I don't know about the *attr methods, but it will definitely be possible to change the object and the attributes during its lifetime. – monoceres May 31 '13 at 15:48

2 Answers2

3

Once you've reached the point where looking up properties in the hash table isn't fast enough, the standard next step is inline caching. You can do this in JIT languages, or even bytecode compilers or interpreters, though it seems to be less common there.

If the shape of your objects can change over time (i.e. you can add new properties at runtime) you'll probably end up doing something similar to V8's hidden classes.

munificent
  • 11,946
  • 2
  • 38
  • 55
  • Very useful links. The inline caching scheme is really smart! Even with changing objects I think it can be a winner. Whenever the object changes you could just mark the object as "dead" so all current caches to the object are invalidated! – monoceres Jun 01 '13 at 09:12
  • Ah, I know I've forgot something. I have my doubts on applying inline caching to interpreters. I think MRI's method cache (for methods only) does something similar, by patching the bytecode instructions. That seems to me as the only way to achieve the caching without huge extra cost. It's notoriously inefficient though. That may be due to the specific implementation and Ruby programmers' habits though. –  Jun 01 '13 at 09:49
1

A technique known as maps can store the values for each attribute in a compact array. The knowledge which attribute name corresponds to which index is maintained in an auxiliary data structure (the eponymous map), so you don't immediately gain a performance benefit (though it does use memory more efficiently if many objects share a set of attributes). With a JIT compiler, you can make the map persistent and constant-fold lookups, so the final machine code can use constant offsets into the attributes array (for constant attribute names).

In an interpreter (I'll assume byte code), things are much harder because you don't have much opportunity to specialize code for specific objects. However, I have an idea myself for turning attribute names into integral keys. Maintain a global mapping assigning integral IDs to attribute names. When adding new byte code to the VM (loading from disk or compiling in memory), scan for strings used as attributes, and replace them with the associated ID, creating a new ID if the string hasn't been seen before. Instead of storing hash tables or similar mappings on each object - or in the map, if you use maps - you can now use sparse arrays, which are hopefully more compact and faster to operate on.

I haven't had a change to implement and test this, and you still need a sparse array. Unless you want to make all objects (or maps) take as many words of memory as there are distinct attribute names in the whole program, that is. At least you can replace string hash tables with integer hash tables. Just by tuning a hash table for IDs as keys, you can make several optimizations: Don't invoke a hash function (use the ID as hash), remove some indirection and hence cache misses, save yourself the complexity of dealing with pathologically bad hash functions, etc.

  • This is very interesting! It's actually bytecode and your second idea sounds great! I had actually thought of something similar myself, but since I did not know about the sparse array data structure, I simply thought the space requirements would be too large! I can now re-evaluate this :) – monoceres Jun 01 '13 at 09:08