Cons of virtual methods in cuda

Question

So far as I understand, virtual method calls are late binding and thus cannot be inlined by the compiler. Apparently, nvcc relies heavily on inlining code. I'm wondering if virtual methods have any serious disadvantage if used in a kernel in Cuda. Is there any situation where they should be avoided? Can they have an effect on performance?

Unless devirtualized at compile time, they are a performance hit (they cost a vtable lookup + an indirect branch). And if threads in a warp don't resolve to the same virtual method (for example when processing an array of objects with different concrete types), you'll get warp divergence. Avoid them as much as you can. Out of curiosity, what kind of application are you writing that requires virtual methods in CUDA code? — user703016, Nov 13 '15 at 10:11
It's not the *method* that "is late binding", it is the method *call* that's late binding. Sometimes. — Kerrek SB, Nov 13 '15 at 10:17
I'm working on an ODE solver. Long story short, I have a method called _solve_ which has two different implementations. I wrote a base class with a pure virtual function and two subclasses that overwrite this method. This is a solution that is easy to maintain, although it might be not optimal. Still, I'm interested to know more about this topic. — eaponte, Nov 13 '15 at 10:23

user703016 · Accepted Answer · 2015-11-13T10:35:20.560

If the compiler can devirtualize the call, it may be able to transform it into a regular method call or even inline it. Clang/LLVM, which powers NVCC, is capable of doing this in some cases, as an optimization. You will have to check the generated code to know whether this is the case.

If the compiler cannot devirtualize the call, then it may have an impact on performance, particularly if that call is on a hot path. A virtual call requires:

a vtable lookup;
an indirect branch.

The vtable lookup costs a memory access, which is slow (and may "waste" cache lines that could be better used) and indirect branches are expensive in general. Moreover, if not all threads within a warp resolve the virtual method to the same address (for example, when processing an array of object with different concrete types), this will lead to warp divergence, which is yet another performance hit.

That being said, if you are not calling the virtual method on a hot path, the impact should be negligible. Without further code, it's impossible to tell.

I'm not familiar with the concept of indirect branches, could you explain what that is? — eaponte, Nov 18 '15 at 08:15
@eaponte Direct branches store the destination address right in the instruction, so it is easy to continue fetching at the target of the branch, it can just peek at the instructions as they are fetched and get the target address. Indirect branches must fetch a value from memory to find their target address, so the instruction fetch stage won't know where to continue until a potentially long memory access completes. — doug65536, May 05 '16 at 10:42

Cons of virtual methods in cuda

1 Answers1