3

I came across articles where in they explain about vptr and vtable. I know that the first pointer in an object in case of a class with virtual functions stored, is a vptr to vtable and vtable's array entries are pointers to the function in the same sequence as they occur in class ( which I have verified with my test program). But I am trying to understand what syntax must compiler put in order to call the appropriate function.

Example:

class Base
 {
   virtual void func1() 
   { 
       cout << "Called me" << endl; 
   }
};
int main()
{
  Base obj;
  Base *ptr;
  ptr=&obj;

// void* is not needed. func1 can be accessed directly with obj or ptr using vptr/vtable
  void* ptrVoid=ptr; 

// I can call the first virtual function in the following way:
  void (*firstfunc)()=(void (*)(void))(*(int*)*(int*)ptrVoid); 
  firstfunc();
}

Questions:

1. But what I am really trying to understand is how compiler replaces the call to ptr->func1() with vptr? If I were to simulate the call then what should I do? should I overload the -> operator. But even that would not help as I would not know what really the name func1 is. Even if they say that compiler accesses the vtable through vptr, still how does it know that the entry of func1 is the first array adn entry of func2 is the second element in the array? There must be some mapping for the names of function to the elements of array.

2. How can I simulate it. Can you provide the actual syntax that compiler uses to call function func1(how does it replace ptr->func1())?

AAEM
  • 1,837
  • 2
  • 18
  • 26
anurag86
  • 1,635
  • 1
  • 16
  • 31
  • The compiler internally uses a mapping (a hash table or lookup table or something similar). As regarding the simulation, I once wrote a small snippet that performs a simplified simulation of virtual calls, see https://github.com/vsoftco/snippets/blob/master/virtual_manual.cpp – vsoftco Sep 20 '15 at 07:37
  • @vsoftco: thanks. i will go over it. btw is there any reference to this topic of internal hash tables used by the compilers? – anurag86 Sep 20 '15 at 07:49
  • I'm not sure, probably there is, but I'm not an expert on this. One good read is http://www.amazon.com/Inside-Object-Model-Stanley-Lippman/dp/0201834545 – vsoftco Sep 20 '15 at 07:51
  • @vsoftco I dont agree. map (idea) is present in compiler (at compile time) or in interpreters. But this techniques are absent at runtime in compiled to "hard code" languages, C++ but Java & C# (JVM & .NET) too. Position of method is fixed, say has `const integer` implementation – Jacek Cz Sep 20 '15 at 08:00
  • @vsoftco (limit of text) anurag86 ask indirect abut runtime, You answer about compiler, if both of You understand each other OK – Jacek Cz Sep 20 '15 at 08:02
  • @JacekCz that's why I said a "simplified simulation". The compiler of course performs something more machine-like, translating at compile time the names of the functions and hashing them. – vsoftco Sep 20 '15 at 08:04
  • @vsoftco OK if we all understand context of our words. Thanks. – Jacek Cz Sep 20 '15 at 08:06

2 Answers2

2

Don't think of a vtable as an array. It's only an array if you strip it of everything C++ knows about it other than the size of its members. Instead, think of it as a second struct whose members are all pointers to functions.

Suppose I have a class like this:

struct Foo {
    virtual void bar();
    virtual int baz(int qux);
    int quz;
}

int callSomeFun(Foo* foo) {
    foo->bar();
    return foo->baz(2);
}

Breaking it down 1 step:

class Foo;
// adding Foo* parameter to simulate the this pointer, which
// in the above would be a pointer to foo.
struct FooVtable {
    void (*bar)(Foo* foo);
    int (*baz)(Foo* foo, int qux);
}
struct Foo {
    FooVtable* vptr;
    int quz;
}

int callSomeFun(Foo* foo) {
    foo->vptr->bar(foo);
    return foo->vptr->baz(foo, 2);
}

I hope that's what you're looking for.

sqykly
  • 1,586
  • 10
  • 16
  • I agree under condition You write (array has slight different definitions in languages from assemble to high level, some assume word array=table, some not). Imagination like C struct with function pointers is OK too, agree – Jacek Cz Sep 20 '15 at 11:55
  • @JacekCz yeah it's just a different way of looking at the same pile of bits. – sqykly Sep 21 '15 at 03:22
  • No both are different. If it had been array then we wudnt be able to use -> to access the elements , however on the other hand structs/classes enable us to use vptr->bar. – anurag86 Sep 21 '15 at 03:37
  • @anurag86 we're talking about at the machine level. The CPU doesn't know or care what operators C++ allows you to use, what the return types or arguments are, etc etc. The difference between `->` and `[]` is C++ syntax, it converts both into the same address mode for the assembler. I can take the breakdown one step farther if you like and hand-compile the example to assembly to show you why they're the same, but I think it won't help if you don't know any assembly. – sqykly Sep 21 '15 at 11:10
0

The backgroud:

  1. After compilation (without debug info) binaries of C/C++ have no names, and names aren't required to runtime work, its only machine code

  2. You can think about vptr like clasic C function pointer, in sense that type, argument list etc is known.

  3. It isn't important on which positions are placed func1, func2 etc, only required is order was always the same (so all parts of multi file C++ must be compiled in the same way, compiler settings etc). Lets imagine, position is in declaration order, FIRST parent class, then newly declared in override BUT reimplemented virtuals are at lower positions, like from parent.

Its only image. Implementation must correctly fire overrides classApionter->methodReimplementedInB()

  1. Usually C++ compiler has/had (my knowledge is from years 16/32b migration) 2-4 option to optimalize vtables against speed/size etc. Classic C sizeof() was quite well to understand (size of data plus ev. alignment), in C++ sizeof is bigger, but can guarantee if it is 2,4,8 bytes.

4 Few conversion tool can convert "object" files i.e. from MS format to Borland etc, but usually/only classic C was possible/safe, because of unknown machine code implementations of vtable.

  1. Hard to touch vtable from high level code, fire analysers for intermediate files (.obj, . etc)

EDIT: story about runtime is different than about compilation. My answer is about compiled code & runtime

EDIT2: quasi assembler code (from my head)

load ax, 2
call vt[ax]

vt:
0x123456
0x126785  // virlual parent func1()

derrived:

vt:
0x123456
0x126999 // overriden finc1()
0x456788 // new method

EDIT3: BTW I can't totally agree that C++ has always better speed JVM/.NET because "these are interpreted". C++ has part of "intepretation", and interpreted part is groving: real component/GUI frameworks have interpreted connections between too (map for example). Out of our discussion: what memory model is better, with C++ delete or GC?

Jacek Cz
  • 1,872
  • 1
  • 15
  • 22
  • @J:Yes, at the end it is machine code. But even if the names are not necessary , there must be some mechanism implemented by compiler to understand _what function should i call_ and so names during the time of resolution of the call is necessary. In order to understand that it needs to have a lookup. Ofcourse the resolution wouldnt be a part of obj because we are talking about late binding here. Still the question remains as to how the call is resolved. What is the logic behind the scenes. – anurag86 Sep 20 '15 at 08:04
  • OK. Assume project with multi files, class override etc. files are cimpiled independed (the newest C++ IDE take last look at all files together). Assume old: funcfion pointer to func1() is at 2nd position AND CODED BINARY, func1() is overriden. Part of resolution is done by compiler, but last resolution (jump to virtual methods, methods know by address not name) at runtime – Jacek Cz Sep 20 '15 at 08:11
  • I understand your point that overriden 0x126999 _would be called_ in your example. But my question is different. I as a implementor take "_would be called_" part for granted . The question is more about _How?_ . In other words try to imagine that you are building your own compiler, how would you handle this situation(not getting into complexities of multi files) where you make a call to a member function(provided you have vptr and vtable). How would u understand that ptr->fun1 is call to func1 and not func2. – anurag86 Sep 20 '15 at 09:43
  • fuct1() (and his overrides) has constant position in vtables (i.e. 2) and CANNOT be mislead with func2() (position ... say 5). Im not native English & have problem how to clear. BTW when func1 has overload (`func1(int)` and `func1(char *)` these are TWO functions independent at this level) – Jacek Cz Sep 20 '15 at 09:49
  • 1
    Compiler constructor build a map (hash table) `map` with cells `"func1(void)"` -> `2` `"func2(void)"` -> `5` (and in my new example `"func1(char*)"` -> 3) and generated code contains only 2,3 and 5. Strategy? Order is class declaration? Alphabetical? Not so important, but always the same – Jacek Cz Sep 20 '15 at 09:52
  • Yes!!!! this is what i was looking for. So it maintains a hash table internally. :) – anurag86 Sep 20 '15 at 09:53
  • I happy. BTW thats way functions has internal names "func1" in C and "func1(int)" or "func1$543k5" in C++ semicompiled code. I lied to you: name are present in intermediate form (*.obj in windows), linker requires, but have been lost in final (*.exe) when non-debug configuration. Linker has not too much intelligent work (in out thread) so I skip over – Jacek Cz Sep 20 '15 at 10:01
  • Ok, and how about dynamically linked libs (.so, .dll)? By the way, is it even possible to use dynamic polymorphisms with dynamically linked code? – Marandil Sep 20 '15 at 13:56
  • 1. for DLL (.so) LINKING is dynamic (usually os module has/had name loader - seems friendly?) , but after linking works the same (call to address known in 1 step) . 2. I think with pure, legally used C/C++ code (not walking over binary address etc) is impossible, agree is possible with pointer hacking (if OS security allow). 3. Almost all virtual machines /interpreters are implemented in C/C++ -> than dynamic polymporphism is possible? ... ;) – Jacek Cz Sep 20 '15 at 14:19