6

I'm writing a toy compiler and want my language support virtual methods, but I have no idea how to do it, it seems not as straight forward as other statements which can be easily turn into the IR code without a second thought, the v-table concept in my mind exists as some graphs and lines just like some high level illustrate. This may enough for using a OOP language but seems not enough for writing one.

I tried to write some C++ code and turn it into ir code but sadly I cannot understand the output still. I checked the source code of Clang and couldn't even figure out where this part sits...(well, I got the code, it seems located at lib/CodeGen/CGClass.cpp, but Clang is a complicated project and I, still, cannot understand how it implement the v-table)

So any idea how to do this, or is there some llvm api to help me implement this?

1 Answers1

5

A vtable is an array of function pointers. In a single-inheritance context, you'd have one such array per class where the elements of the array are the class's virtual methods. Each object would then contain a pointer to its class's vtable and each virtual method call would simply invoke the corresponding pointer in the vtable (after casting it to the needed type).

So let's say you're compiling a program that looks like this:

class A {
  int x,y;

  virtual int foo() { return x+y; }
  virtual int bar() { return x*y; }
}

class B inherits A {
  int z;
  override int bar() { return x*y+z; }
}

int f(A a) {
  return a.foo() + a.bar();
}

Then you could define functions named A_foo, A_bar and B_bar taking an A or B pointer and containing the code for A.foo, A.bar and B.bar respectively (the exact naming would depend on your name mangling scheme of course). Then you'd generate two globals A_vtable and B_vtable that'd look like this:

@A_vtable = global [2 x void (...)*] [
  void (...)* bitcast (i32 (%struct.A*)* @A_foo to void (...)*),
  void (...)* bitcast (i32 (%struct.A*)* @A_bar to void (...)*)
]
@B_vtable = global [2 x void (...)*] [
  void (...)* bitcast (i32 (%struct.A*)* @A_foo to void (...)*),
  void (...)* bitcast (i32 (%struct.B*)* @B_bar to void (...)*)
]

Which would correspond to this C code (which is hopefully more readable):

typedef void (*fpointer_t)();
fpointer_t A_vtable[] = {(fpointer_t) A_foo, (fpointer_t) A_bar};
fpointer_t B_vtable[] = {(fpointer_t) A_foo, (fpointer_t) B_bar};

f could then be translated like this:

define i32 @f(%struct.A*) {
  %2 = getelementptr inbounds %struct.A, %struct.A* %0, i64 0, i32 0
  %3 = bitcast %struct.A* %0 to i32 (%struct.A*)***
  %4 = load i32 (%struct.A*)**, i32 (%struct.A*)*** %3
  %5 = load i32 (%struct.A*)*, i32 (%struct.A*)** %4
  %6 = call i32 %5(%struct.A* %0)

  %7 = load void (...)**, void (...)*** %2
  %8 = getelementptr inbounds void (...)*, void (...)** %7, i64 1
  %9 = bitcast void (...)** %8 to i32 (%struct.A*)**
  %10 = load i32 (%struct.A*)*, i32 (%struct.A*)** %9
  %11 = call i32 %10(%struct.A* %0)

  %12 = add nsw i32 %11, %6
  ret i32 %12
}

Or in C:

typedef int (*A_int_method_t)(struct A*);
int f(struct A* a) {
  return ((A_int_method_t) a->vtable[0])(a) + ((A_int_method_t) a->vtable[1])(a);
}
sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • Is there any way to reduce the entry size? Let’s say we have ten virtual functions in class A and only one function is override by class B, does that mean I have to put other 9 entries in B’s vtable as well? If the inherit tree is large, isn’t the vtable larger for child class? –  Feb 27 '19 at 01:06
  • 1
    @reavenisadesk The thing is that `f` doesn't know anything about `B`, it simply accesses the given `A*`'s vtable at the index corresponding to the method it wants to call. For this to work, the vtable must have at least as many entries as `f` expects it to have and all the entries must be valid function pointers (but it's not like putting null pointers in there would save anything anyway). So yes, if `A` has 10 virtual methods, `B`'s vtable must also have 10 entries (plus whichever virtual methods `B` defines itself). Note that the vtable only exists once per class and doesn't affect the... – sepp2k Feb 27 '19 at 01:15
  • ... size of the objects. So it's just consuming a couple of bytes of constant memory per class, which isn't generally an issue. – sepp2k Feb 27 '19 at 01:15
  • @sepp2k could you explain more about void (...)* ? What does it mean and how did you create it? – worldterminator Jul 10 '20 at 10:20
  • @worldterminator `void (...)*` is a pointer to a variadic void function, but the exact type doesn't matter because we always bitcast it before use anyway. You can use any function pointer type you want (or really any pointer type - I just used a function pointer to emphasize that it will always point to a function). I'm not sure what you mean by "creating" it. Do you mean how did I generate the type using the LLVM API? I didn't, I just wrote the LLVM code by hand for this answer, but the way you'd do it would be by using `llvm::PointerType` and `llvm::FunctionType`. – sepp2k Jul 10 '20 at 11:44