7

In the following code, it calls a virtual function foo via a pointer to a derived object. Will this call go through the vtable or will it call B::foo directly?

If it goes via a vtable, what would be a C++ idiomatic way of making it call B::foo directly? I know that in this case I am always pointing to a B.

Class A
{
    public:
        virtual void foo() {}
};

class B : public A
{
    public:
        virtual void foo() {}
};


int main()
{
    B* b = new B();
    b->foo();
}
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
aaa
  • 113
  • 1
  • 3
  • 1
    Are you trying to optimize (don't waste your time that's the compilers job). Or do you want a technique to just call B's version of foo()? – Martin York Dec 16 '10 at 18:53
  • 3
    You should not really worry on whether the dispatch will be direct or go through a vtable. In most scenarios, the virtual method table dispatch will almost never have a significant impact on performance. – David Rodríguez - dribeas Dec 16 '10 at 18:56

6 Answers6

10

Most compilers will be smart enough to eliminate the indirect call in that scenario, if you have optimization enabled. But only because you just created the object and the compiler knows the dynamic type; there may be situations when you know the dynamic type and the compiler doesn't.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • In which case you can use `static_cast` to force the dynamic type to be known..... – Billy ONeal Dec 16 '10 at 18:33
  • 3
    @Billy: I'm not so sure. `static_cast` just tells the compiler that the dynamic type is a (non-strict) subclass of `B`, not that it is exactly `B`, so the optimization doesn't apply IMO. – Ben Voigt Dec 16 '10 at 18:35
  • @Billy: Why try and fool the compiler. The compiler already knows more than a human about the code and how to optimize it. Just let it do its job. – Martin York Dec 16 '10 at 18:56
  • 1
    I agree with @Ben, if you want to call an specific version, you must use the qualified function name (as in `p->B::foo()`, with `p` being of type `B` or derived from `B` (`static_cast` might be required there if the actual pointer is to a base). `static_cast` on it's own does not help, as the compiler will not know if the type you have casted to is the *final overrider* or not. – David Rodríguez - dribeas Dec 16 '10 at 19:00
  • 1
    @Ben, @David: C++0x introduced the `final` attribute if I recall correctly (not that I like the idea much...), I suppose this will open a number of optimization of this sort. – Matthieu M. Dec 16 '10 at 19:24
  • @Ben: @Martin: Good points :) My assumption before making a change like that would be that someone profiled and found the virtual call to be A. unnecessary and B. expensive in terms of overall program time. Didn't think about the fact that what you `static_cast` too might have subclasses though.... In any case, I didn't know about the syntax in @robert's answer, which does what I thought the static_cast would do. – Billy ONeal Dec 17 '10 at 03:10
  • @Matthieu: It's way too early to talk about C++0x in the past tense. Anyway, can you provide the section number that describes it? Section `[dcl.attr.final]` has been struck out in the draft I'm using (n3225). Languages like Java, where everything is virtual by default, need a keyword to make methods non-virtual. In C++, you just don't make the method virtual in the first place, so `final` is of questionable utility. – Ben Voigt Dec 17 '10 at 14:31
  • Ahh, `final` is now in `[class.mem]`, using a syntax closer to C++/CLI. And it's not an attribute. – Ben Voigt Dec 17 '10 at 14:33
  • @Ben Voigt: I am doubtful of `final` value (I prefer to mark my method as "override as your own risk" but that may be a C++ mindset :P), sorry that I did not provide the reference, I am trying to hone my standardista's reflexes but I am not there yet :) – Matthieu M. Dec 17 '10 at 15:43
7

As usual, the answer to this question is "if it is important to you, take a look at the emitted code". This is what g++ produces with no optimisations selected:

18     b->foo();
0x401375 <main+49>:  mov    eax,DWORD PTR [esp+28]
0x401379 <main+53>:  mov    eax,DWORD PTR [eax]
0x40137b <main+55>:  mov    edx,DWORD PTR [eax]
0x40137d <main+57>:  mov    eax,DWORD PTR [esp+28]
0x401381 <main+61>:  mov    DWORD PTR [esp],eax
0x401384 <main+64>:  call   edx

which is using the vtable. A direct call, produced by code like:

B b;
b.foo();

looks like this:

0x401392 <main+78>:  lea    eax,[esp+24]
0x401396 <main+82>:  mov    DWORD PTR [esp],eax
0x401399 <main+85>:  call   0x40b2d4 <_ZN1B3fooEv>
unquiet mind
  • 1,082
  • 6
  • 11
  • @David Indeed. But which optimisations? My point is that you need to look at the code in order to use optimisations efficiently. – unquiet mind Dec 16 '10 at 19:11
  • 1
    @Unquiet: G++, so either -O2 or -O3 optimizations. Also, I'll say what I've said to everyone saying "look at the assembly" -- not all of us know assembly, and assuming everyone using a higher level language like C++ knows it is kind of unreasonable. – Billy ONeal Dec 17 '10 at 03:12
  • @Billy If you don't know the basics of assembler for your platform, then IMHO you have no business being a programmer. It's not exactly rocket science, after all. – unquiet mind Dec 17 '10 at 09:23
  • 1
    @unquiet mind: Interesting that I know *several* programmers, not one of whom know assembler. It's just really not necessary for the vast majority of people. – Billy ONeal Dec 17 '10 at 13:42
  • @Billy I too know lots of "programmers" who don't know assembly. But Sturgeon's Law applies to programmers too. – unquiet mind Dec 17 '10 at 14:01
  • @Billy: One can't be a *software engineer* without at least being able to read assembly. Sadly, most *programmers* are sorely lacking in software engineering skills. (Related: many programmers, including software engineers, are sorely lacking in *computer science* skills). OTOH for performance questions in general, and implementation details questions especially, which this question certainly is, some knowledge of assembly **is** part of the essential prerequisite knowledge, even if it isn't for questions of the form "why does `double x = 3/2;` equal to 1.00?". – Ben Voigt Dec 17 '10 at 14:41
  • @Ben: Understanding how the underlying hardware operates is useful. That does not mean one needs to actually know assembly to effectively use the platform. 90% of the time I see people asking questions like this, they're asking because they **don't** know assembly, and therefore cannot do the kind of analysis advocated here. Most software developers of any description do not know assembly. Therefore an answer saying "just look at the assembly" isn't really a good answer. – Billy ONeal Dec 17 '10 at 14:59
4

Yes, it will use the vtable (only non-virtual methods bypass the vtable). To call B::foo() on b directly, call b->B::foo().

robert
  • 33,242
  • 8
  • 53
  • 74
  • 4
    For the code in the question, not only will most optimizing compilers not use the v-table, most will inline the empty body, and the v-table itself may be eliminated by the linker since it isn't used. – Ben Voigt Dec 16 '10 at 18:32
  • 2
    @Ben Voigt yes, that's very possible. I assume the code the original poster is looking at is much more complicated, and that may not be the case. – robert Dec 16 '10 at 18:34
  • Thanks, this syntax is what I was missing. I am aware of the other issues related to optimizations etc. – aaa Dec 16 '10 at 19:22
4

This is the compiled code from g++ (4.5) with -O3

_ZN1B3fooEv:
    rep
    ret

main:
    subq    $8, %rsp
    movl    $8, %edi
    call    _Znwm
    movq    $_ZTV1B+16, (%rax)
    movq    %rax, %rdi
    call    *_ZTV1B+16(%rip)
    xorl    %eax, %eax
    addq    $8, %rsp
    ret

_ZTV1B:
    .quad   0
    .quad   _ZTI1B
    .quad   _ZN1B3fooEv

The only optimization it did was that it knew which vtable to use (on the b object). Otherwise "call *_ZTV1B+16(%rip)" would have been "movq (%rax), %rax; call *(%rax)". So g++ is actually quite bad at optimizing virtual function calls.

Emil
  • 16,784
  • 2
  • 41
  • 52
  • GCC 4.6 and later produce `call _ZN1B3fooEv` so devirtualize successfully and call `B::foo()` directly (http://goo.gl/wxcSiw) – Jonathan Wakely Dec 17 '14 at 21:37
  • I also see that. But why can't it optimize away (inline) the function call since the method body of B::foo is empty... – Emil Dec 18 '14 at 22:28
  • GCC 4.7+ does inline it, just 4.6 doesn't (note that the link in my previous comment didn't define the member, specifically so it wouldn't be optimised away and the call would be shown) – Jonathan Wakely Dec 18 '14 at 22:30
1

Compiler can optimize away virtual dispatch and call virtual function directly or inline it if it can prove it's the same behavior. In the provided example, compiler will easily throw away every line of code, so all you'll get is this:

int main() {}
Gene Bushuyev
  • 5,512
  • 20
  • 19
  • 2
    Compiler is not allowed to remove the call to new. It has side-effects that the compiler can not anlayse as it results in calls to the underlying library for memory allocation. – Martin York Dec 16 '10 at 20:17
0

I changed the code up a bit to give it a go myself, and to me it looks like it's dropping the vtable, but I'm not expert enough in asm to tell. I'm sure some commentators will set me right though :)

struct A {
    virtual int foo() { return 1; }
};

struct B : public A {
    virtual int foo() { return 2; }
};

int useIt(A* a) {
    return a->foo();
}

int main()
{
    B* b = new B();
    return useIt(b);
}

I then converted this code to assembly like this:

g++ -g -S -O0  -fverbose-asm virt.cpp 
as -alhnd virt.s > virt.base.asm
g++ -g -S -O6  -fverbose-asm virt.cpp 
as -alhnd virt.s > virt.opt.asm

And the interesting bits look to me like the 'opt' version is dropping the vtable. It looks like it's creating the vtable but not using it..

In the opt asm:

9:virt.cpp      **** int useIt(A* a) { 
89                    .loc 1 9 0 
90                    .cfi_startproc 
91                .LVL2: 
10:virt.cpp      ****     return a->foo(); 
92                    .loc 1 10 0 
93 0000 488B07        movq    (%rdi), %rax    # a_1(D)->_vptr.A, a_1(D)->_vptr.A 
94 0003 488B00        movq    (%rax), %rax    # *D.2259_2, *D.2259_2 
95 0006 FFE0          jmp *%rax   # *D.2259_2 
96                .LVL3: 
97                    .cfi_endproc 

and the base.asm version of the same:

  9:virt.cpp      **** int useIt(A* a) { 
  88                    .loc 1 9 0 
  89                    .cfi_startproc 
  90 0000 55            pushq   %rbp    # 
  91                .LCFI6: 
  92                    .cfi_def_cfa_offset 16 
  93                    .cfi_offset 6, -16 
  94 0001 4889E5        movq    %rsp, %rbp  #, 
  95                .LCFI7: 
  96                    .cfi_def_cfa_register 6 
  97 0004 4883EC10      subq    $16, %rsp   #, 
  98 0008 48897DF8      movq    %rdi, -8(%rbp)  # a, a 
  10:virt.cpp      ****     return a->foo(); 
  99                    .loc 1 10 0 
 100 000c 488B45F8      movq    -8(%rbp), %rax  # a, tmp64 
 101 0010 488B00        movq    (%rax), %rax    # a_1(D)->_vptr.A, D.2263 
 102 0013 488B00        movq    (%rax), %rax    # *D.2263_2, D.2264 
 103 0016 488B55F8      movq    -8(%rbp), %rdx  # a, tmp65 
 104 001a 4889D7        movq    %rdx, %rdi  # tmp65, 
 105 001d FFD0          call    *%rax   # D.2264 
  11:virt.cpp      **** } 
 106                    .loc 1 11 0 
 107 001f C9            leave 
 108                .LCFI8: 
 109                    .cfi_def_cfa 7, 8 
 110 0020 C3            ret 
 111                    .cfi_endproc 

On line 93 we see in the comments: _vptr.A which I'm pretty sure means it's doing a vtable lookup, however, in the actual main function, it seems to be able to predict the answer and doesn't even call that useIt code:

 16:virt.cpp      ****     return useIt(b);
 17:virt.cpp      **** }
124                    .loc 1 17 0
125 0015 B8020000      movl    $2, %eax    #,

which I think is just saying, we know we're gonna return 2, lets just put it in eax. (I re ran the program asking it to return 200, and that line got updated as I would expect).


extra bit

So I complicated the program up a bit more:

struct A {
    int valA;
    A(int value) : valA(value) {}
    virtual int foo() { return valA; }
};

struct B : public A {
    int valB;
    B(int value) : valB(value), A(0) {}
    virtual int foo() { return valB; }
};

int useIt(A* a) {
    return a->foo();
}

int main()
{
    A* a = new A(100);
    B* b = new B(200);
    int valA = useIt(a);
    int valB = useIt(a);
    return valA + valB;
}

In this version, the useIt code definitely uses the vtable in the optimized assembly:

  13:virt.cpp      **** int useIt(A* a) {
  89                    .loc 1 13 0
  90                    .cfi_startproc
  91                .LVL2:
  14:virt.cpp      ****     return a->foo();
  92                    .loc 1 14 0
  93 0000 488B07        movq    (%rdi), %rax    # a_1(D)->_vptr.A, a_1(D)->_vptr.A
  94 0003 488B00        movq    (%rax), %rax    # *D.2274_2, *D.2274_2
  95 0006 FFE0          jmp *%rax   # *D.2274_2
  96                .LVL3:
  97                    .cfi_endproc

This time, the main function inlines a copy of useIt, but does actually do the vtable lookup.


What about c++11 and the 'final' keyword?

So I changed one line to:

virtual int foo() override final { return valB; }

and the compiler line to:

g++ -std=c++11 -g -S -O6  -fverbose-asm virt.cpp

Thinking that telling the compiler that it is a final override, would allow it to skip the vtable maybe.

Turns out it still uses the vtable.


So my theoretical answer would be:

  • I don't think there are any explicit, "don't use the vtable" optimizations. (I searched through the g++ manpage for vtable and virt and the like and found nothing).
  • But g++ with -O6, can do a lot of optimization on a simple program with obvious constants to the point where it can predict the result and skip the call altogether.
  • However, once things get complex (read real) it's definitely doing vtable lookups, pretty much everytime you call a virtual function.
matiu
  • 7,469
  • 4
  • 44
  • 48