16

Suppose I have the following code

void f(PolymorphicType *p)
{
    for (int i = 0; i < 1000; ++i)
    {
        p->virtualMethod(something);
    }
}

Will the compiler's generated code dereference p's vtable entry for virtualMethod 1 or 1000 times? I am using Microsoft's compiler.

edit

here is the generated assembly for the real-world case I'm looking at. line->addPoint() is the virtual method of concern. I have no assembly experience, so I'm going over it slowly...

; 369  :        for (int i = 0; i < numPts; ++i)

    test    ebx, ebx
    je  SHORT $LN1@RDS_SCANNE
    lea edi, DWORD PTR [ecx+32]
    npad    2
$LL3@RDS_SCANNE:

; 370  :        {
; 371  :            double *pts = pPoints[i].SystemXYZ;
; 372  :            line->addPoint(pts[0], pts[1], pts[2]);

    fld QWORD PTR [edi+8]
    mov eax, DWORD PTR [esi]
    mov edx, DWORD PTR [eax+16]
    sub esp, 24                 ; 00000018H
    fstp    QWORD PTR [esp+16]
    mov ecx, esi
    fld QWORD PTR [edi]
    fstp    QWORD PTR [esp+8]
    fld QWORD PTR [edi-8]
    fstp    QWORD PTR [esp]
    call    edx
    add edi, 96                 ; 00000060H
    dec ebx
    jne SHORT $LL3@RDS_SCANNE
$LN314@RDS_SCANNE:

; 365  :        }
japreiss
  • 11,111
  • 2
  • 40
  • 77
  • 5
    Ask the compiler to generate assembler code and check. – Some programmer dude Feb 07 '13 at 17:14
  • 1
    Compile it with optimizations and look at the resultant generated code. – Anya Shenanigans Feb 07 '13 at 17:15
  • 3
    To add to Joachim's comment - *there is no other way* to find out other than checking. – Bartek Banachewicz Feb 07 '13 at 17:15
  • By the way, `p` doesn't have a vtable. `PolymorphicType` has a vtable, and `*p` has a *pointer* to that vtable. – Kerrek SB Feb 07 '13 at 17:16
  • ... and even if it does dereference the pointer 1000 times, it will still take only a microsecond or two. – Mr Lister Feb 07 '13 at 17:21
  • @MrLister: Your estimate is high. Since the call will hit in the BTB the only overhead is the extra decode (the value loaded is only used to confirm the BTB hit), so I would expect an overhead of 1/3 cycle per iteration on a modern x86 processor. Or about 160ns at 2GHz. – Chris Dodd Feb 07 '13 at 17:36
  • 1
    I'm not sure what the compiler is allowed to assume here too. Calling ->virtualFunction *could* change the thing the p points to, what if the object pointed to by p is deleted and a new object created in that location of a different subtype? (I don't know if that's possible or legal but it's the kind of thing that compiler writers have to think about when they are generating code...), They have to be VERY careful what they assume... – jcoder Feb 07 '13 at 17:37
  • @ChrisDodd Ah, optimisation at the hardware level. – Mr Lister Feb 07 '13 at 17:38
  • 1
    @jcoder I'd imagine some sort of aliasing rule sidesteps that issue. – Cory Nelson Feb 07 '13 at 17:39
  • I would guess the answer might also depend on whether the pointer `p` or `line` has automatic storage duration, and whether a pointer or reference to that pointer is ever created. – aschepler Feb 07 '13 at 17:40
  • @Cory Indeed, I'm just thinking if _you_ were writing a compiler would you want to assume that nothing the called function does could change where the next call goes to given all the complexities of c++ or would you just decide to load it each time.... :) My point is that it may be very difficult for the compiler to prove that this is a valid optimisation given if it can't see the definition of all the code called directly and indirectly by the function. – jcoder Feb 07 '13 at 17:41
  • 2
    OK, so `mov edx, DWORD PTR [eax+16]` ... `call edx` inside the loop. I guess it's looking at the vtable every time. – japreiss Feb 07 '13 at 17:44
  • 1
    @jcoder I concur. The question is easy enough to answer (compile to asm and check). The bigger question eluded to is whether such action is *allowed* by the standard, whether it is covered at all, or whether it punts and leaves it entirely up to the implementation. Assuming code emitted by a compiler is synonymous to rules brought forth by the standard is a grand idea, but seems a touch cart-before-the-horse to me regarding questions like this. – WhozCraig Feb 07 '13 at 17:49

2 Answers2

6

In general, no, it is not possible. The function could destroy *this and placement-new some other object derived from the same base in that space.

Edit: even easier, the function could just change p. The compiler cannot possibly know who has the address of p, unless it is local to the optimization unit in question.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
2

Impossible in general, but there are special cases that can be optimized, especially with inter-procedural analysis. VS2012 with full optimizations and whole-program optimization compiles this program:

#include <iostream>

using namespace std;

namespace {
struct A {
  virtual void foo() { cout << "A::foo\n"; }
};

struct B : public A {
  virtual void foo() { cout << "B::foo\n"; }
};

void test(A& a) {
  for (int i = 0; i < 100; ++i)
    a.foo();
}
}

int main() {
  B b;
  test(b);
}

to:

01251221  mov         esi,64h  
01251226  jmp         main+10h (01251230h)  
01251228  lea         esp,[esp]  
0125122F  nop  
01251230  mov         ecx,dword ptr ds:[1253044h]  
01251236  mov         edx,12531ACh  
0125123B  call        std::operator<<<std::char_traits<char> > (012516B0h)  
01251240  dec         esi  
01251241  jne         main+10h (01251230h)  

so it's effectively optimized the loop to:

for(int i = 0; i < 100; ++i)
  cout << "B::foo()\n";
Casey
  • 41,449
  • 7
  • 95
  • 125