What are the adavantages and use cases of CRTP compiletime polymorphism?

Question

I see we could introduce some sort of compiletime polymorphism using CRTP, however I wonder how this can be better than good old virtual functions. In the end we have to call static_cast<const T*>(this)->implementation(); which is one level of indirection exactly like a vtable does. How are they different? Are there any advantages? I see only the downside that they cannot be destroyed by baseclass. To me it looks nice as an academic examlpe, but dominated by regular polymorphism.

According to Anger's optimization book, "The few clock cycles that we may save by avoiding the virtual function dispatch mechanism is rarely enough to justify such a complicated code that is difficult to understand and therefore difficult to maintain. If the compiler is able to do the devirtualization automatically then it is certainly more convenient to rely on compiler optimization than to use this complicated template method." — Nimrod, Dec 09 '21 at 09:25
Consider [this clone pattern](https://stackoverflow.com/q/9422760/580083). Without CRTP, you would need to define the `clone()` virtual function in each derived class. With CRTP, you have to define it only once. CRTP is not an alternative here to virtual functions. Instead, both mechanisms are combined together. — Daniel Langr, Dec 09 '21 at 09:28

score 2 · Accepted Answer · answered Dec 09 '21 at 10:16

The reason is performance.

static_cast<const T*>(this)->implementation(); is resolved at compile-time to the address of the corresponding T::implementation() overload:

    CALL <fixed-address>

A virtual member call on the other hand, is generally resolved at run-time using an indirect call via an offset in a vtable. In the simplest cases the optimizer can transform this to compile-time, but there is no way to do this reliably. So generally you can expect the code to look like this:

    MOV rax, <vtable-ptr>
    MOV rax, [rax+<offset>] ; Indirection via a vtable
    CALL rax

This type of call will most likely take a pipeline stall on the first invocation due to the data dependency on the target address, and subsequent invocations will rely heavily on the quality of the branch predictor.

The static call on the other hand is very fast as it does not involve a pipeline stall.

What are the adavantages and use cases of CRTP compiletime polymorphism?

1 Answers1