C++ devirtualization at runtime?

Question

Are there techniques/libraries that allow the flexibility of having a class hierarchy (that has virtual functions) yet, once the objects types have been determined at runtime, allows devirtualization of the function calls?

For a simple example, suppose I have a program that reads shape types (circle, rectangle, triangle, etc.) from some configuration file to construct a some data structure of said shapes (e.g., vector<shape*>):

class shape {
public:
    virtual void draw() const = 0;
    // ...
};

class circle : public shape {
public:
    void draw() const;
    // ...
};

// ...
vector<shape*> shapes;

Obviously, if I want to draw all the shapes, I can do:

for ( auto&& s : shapes )
    s->draw();

Every time such an iteration over shapes is done, a virtual function call is made to call draw() for every shape.

But suppose once shapes is created, it's never going to change again for the lifetime of the program; and further suppose that draw() is going to be called many times. It would be nice if, once the actual shapes are known, there were a way to "devirtualize" the calls to draw() at runtime.

I know about optimization techniques to devirtualize virtual function calls at compile time, but I am not asking about that.

I'd be very surprised if there were a clever hack to do this in C++ directly since one way of doing this would be to modify the in-memory machine code at runtime. But is there some C++ library out there that enables such a thing?

I imagine something like this might be possible via LLVM since that allows generation of machine code at runtime. Has anybody used LLVM for this? Perhaps a framework layered on top of LLVM?

NOTE: the solution has to be cross platform, i.e., at least using gcc/clang and VC++.

How would this devirtualization work if there were different shapes in the list? Would it (conceptually) replace the s->draw() call inside the for loop with a (if type == Circle) s->circle::draw(); else if (type == Square) s->square::draw(); type of instructions? If so, I'm not sure that would necessarily be any more efficient. — Jeremy Friesner, May 19 '15 at 00:03
If there are no further subclasses of shape, you get to "devirtualize" this yourself, by typedef-ing one to the other. But, as long as there are two or more subclasses, unless the compiler can somehow prove to itself that a particular chunk of code will only use a specific subclass, it has to use a virtual function call. It's unlikely, but possible. After "auto x=new circle;" the compiler knows that, as long as it has "x" at hand, "x->draw()" can be safely devirtualized, and many compilers probably already do this. Unfortunately, the scope of such optimizations tend to be rather small. — Sam Varshavchik, May 19 '15 at 00:04
Have you performed profiling that indicates that the vtable jump has measurable performance impact? vtable calls are still *very* cheap. Premature optimization is the root all evil, according to Donald Knuth. — Dai, May 19 '15 at 00:05
GCC has an extension that allows you to store the dynamically looked-up function pointer. — Kerrek SB, May 19 '15 at 00:06
VC has the `__declspec(novtable)` attribute, it's intended for pure-interfaces and is a compile-time feature, but might be of relevance to your use-case. — Dai, May 19 '15 at 00:06
@JeremyFriesner: the devirtualization would work by making a 1-time pass over the vector and, based on the actual type of the shape, replacing the virtual function call with a regular function call for the corresponding shape. — Paul J. Lucas, May 19 '15 at 00:09
@SamVarshavchik: Again, I'm _not_ asking about _compile time_ optimization, so phrases like "... unless the compiler ..." are missing the point. — Paul J. Lucas, May 19 '15 at 00:11
I've never heard of an automated way to do that, but you could always do it manually by placing your objects into different lists based on their types. Then you could iterate over each of the per-type lists calling the appropriate non-virtual method for each list. — Jeremy Friesner, May 19 '15 at 00:13
@JeremyFriesner: My use of `vector` was just an example. I could equally well have some kind of tree data structure of objects, e.g., an eval tree for expressions. — Paul J. Lucas, May 19 '15 at 00:15
@KerrekSB: Can you elaborate? I.e., the name of the extension? Regardless, even assuming I could store the pointer, calling it like `(*f)()` would be an indirect function call that's pretty much the same thing as a virtual function call. — Paul J. Lucas, May 19 '15 at 00:23
A virtual call *per se* is substantially free. After a few calls, the branch predictor picks it up, and speculatively executes the correct function without waiting for the vtable lookup - IOW, your devirtualizer is already there, implemented in hardware. The real cost is the missed potential for inlining, but that would require a full recompilation and optimization of the function at runtime, so it's definitely JIT material (it's not just a matter of patching a few bits in the function body). — Matteo Italia, May 19 '15 at 00:24
*"But is there some C++ library out there"* - requests for library suggestions are off-topic on S.O.. More generally, if you need runtime code generation, you can `system()` out to your compiler and `dlopen()`-or-similar your custom function. — Tony Delroy, May 19 '15 at 01:47
@PaulJ.Lucas: [see here](https://gcc.gnu.org/onlinedocs/gcc/Bound-member-functions.html). — Kerrek SB, May 19 '15 at 08:20
@MatteoItalia: I don't see how a branch predictor could know to inspect a C++ `vector` to know which actual function is going to be called. — Paul J. Lucas, May 19 '15 at 19:44
@PaulJ.Lucas: the branch predictor aims to remember what actual location is most often the target of each "hot" indirect jump. How you obtained that address is irrelevant, if it's relatively constant at each iteration it's going to be predicted fairly well. Of course in moderately complex situations (say you are calling virtual functions on a vector of objects of mixed types) it's going to fail miserably, but since you are talking about "temporarily conctretizing" the virtual calls you are already expecting a situation where the target isn't going to change so quickly. — Matteo Italia, May 19 '15 at 19:48
Also: *not* inspecting the C++ vector or the flags register is exactly the point of branch prediction - it has to work way before the rest of the pipeline has finished to elaborate the data dependencies of the jump, because feeding the pipeline depends on finding out beforehand what direction has to be taken at each conditional/indirect jump. — Matteo Italia, May 19 '15 at 19:57
BTW, it's not like your idea is a bad one - actually, a similar concept is at the basis of tracing JITs like PyPy or LuaJIT: where the language is *so* dynamic that almost every opcode is in theory an indirect jump (as the types involved may be anything), a good way to make it fast is to trace the execution in the interpreter and, after finding a good hot zone where the types are always the same, "concretize" it in actual machine code tailored for the types/code paths currently involved. — Matteo Italia, May 19 '15 at 19:59
Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/78248/discussion-on-question-by-paul-j-lucas-c-devirtualization-at-runtime). — Taryn, May 19 '15 at 20:01
Does this answer your question? [How to get "direct" function pointer to a virtual member function?](https://stackoverflow.com/questions/20520756/how-to-get-direct-function-pointer-to-a-virtual-member-function) — JohnAl, Feb 02 '22 at 20:56
@JohnAl No because it's not standard (i.e., guaranteed to work on any implementation). — Paul J. Lucas, Feb 03 '22 at 21:17
@PaulJ.Lucas you just said "the solution has to be cross platform, i.e., at least using gcc/clang and VC++". The answer described there works with those three compilers. But yeah, it is not "guaranteed to work on any implementation". — JohnAl, Feb 04 '22 at 10:34

score 3 · Answer 1 · answered May 19 '15 at 09:49

I'm fairly certain that there's no such thing as a magic, "Here compiler, there are no more subclasses I'm going to define and this list is not going to change, so eliminate the virtual function call overhead" kind of thing.

One thing you can do that can help with virtual calls in extremely performance-critical situations is to sort your list of shapes by their subtypes. For example, instead of a sporadic pattern of subtypes like circle, rectangle, triangle, rectangle, triangle, square, etc., you want to rearrange those types to form like: circle, circle, circle, circle, ..., square, square, square, ..., etc. This is effectively optimizing for branch prediction. I don't know if this method is still applicable or yields much mileage with the latest architectures and optimizers, but there was at least a time not too long ago when I was alive where it was very useful.

About JITs, I've been exploring that area a bit. I wouldn't necessarily recommend trying to find a JIT solution to magically make your C++ code faster.

Instead, I've been exploring it since my software already has a domain-specific language, a visual kind of nodal GUI programming language where you draw connections between nodes (functions) instead of writing code to make new things like shaders and image filters (similar to Unreal Engine 4's BluePrint). It's currently nowhere near as fast as handwritten native code which is why I was interested in exploring a code-generation/JIT route. It currently works more like an interpreter.

I've tried Tiny C and LCC for these but one thing I found rather disappointing about them is that their optimizers aren't quite as sophisticated as your commercial production compilers. I often got results averaging 3 to 4 times slower than MSVC or GCC. They're otherwise wonderful since they're so featherweight and easy to embed.

LLVM seems like a wonderful match except that it's enormous. We have this kind of old school aesthetic in our core-level area where the code that is meant to be maximally reused should reuse as little as possible (to avoid sporadic dependencies to external packages). I've had a difficult time reducing that down to something featherweight enough to pass those standards, but I'm still looking into it.

"Here compiler, ..." Already wrong. My question explicitly included "at runtime." A compiler is obviously compile-time. — Paul J. Lucas, May 19 '15 at 20:05

C++ devirtualization at runtime?

1 Answers1