How do C++ compilers optimize template code?

Question

How do compilers avoid linear growth in the size of the compiled binary with each new type instantiation of a template?

I don't see how we can avoid making a copy of all the templated code when a new instantiation is used.

I feel that compile times and binary sizes would be made extremely unwieldy for all but the simplest templates in a reasonably large code base. But their prevalence suggests that compilers are able to do some magic to make them practical.

They don't. (Though they might do some coalescing for same types) That's why it's possible to crash a C++ compiler with a few lines of carefully written template code. — Mysticial, Dec 11 '13 at 20:29
A class template instantiation would appear in your exe once, no matter how many times you created an instance of it. Same with function templates. You don't get a massive executable just because you declared 1000 variables of type vector. — , Dec 11 '13 at 20:31
Here's an example of one that broke almost all the compilers we tried: http://chat.stackoverflow.com/transcript/message/3657411#3657411 — Mysticial, Dec 11 '13 at 20:33
And another one which crashed ICC: http://chat.stackoverflow.com/transcript/message/3657932#3657932 — Mysticial, Dec 11 '13 at 20:35
Templates are either instantiated once per compilation unit where they're used, then it's the linker's job to merge thousand identical pieces of code into one. Or, they are explicitly instantiated by the user in exactly one place, and not otherwise. One is the "Borland" model, the other is the "Cfront" model, but I forgot which is which. Or, of course, a template may result in a lot of compiletime computation and no code at all (merely a type or a constant). — Damon, Dec 11 '13 at 20:37
it's up to you to reduce code bloat. putting stuff in a *single* (non template) base-class and using `void *` and safe type casting through templates to reduce the amount of code templates instantiate. http://eugenedruy.wordpress.com/2009/07/19/refactoring-template-bloat/ for instance — Alexander Oh, Dec 11 '13 at 20:43

score 6 · Answer 1 · answered Dec 11 '13 at 20:47

Many template functions are small enough to inline effectively, so you do get linear growth in the binary - but it is no more than you would get with equivalent non-template functions.

The One Definition Rule is important here, as it allows the compiler to assume that any template instantiation with identical template parameters generates identical code. If it detects that the template function has already been instantiated earlier in a source file, it can use that copy instead of generating a new one. Name mangling makes it possible for a linker to recognize the same function from different compiled sources. None of this is guaranteed since your program shouldn't be able to tell the difference between identical copies of a function, but compilers do harder optimizations than this every day.

The one time that duplicates are required to be filtered out is when a function contains a static variable - there can only be one copy. But that can be achieved either by filtering out the duplicate functions, or filtering out the static variables themselves.

score 5 · Answer 2 · answered Dec 11 '13 at 20:50

There are multiple things which result in multiple instantiations not being too harmful to the exacutable size:

Many templates are just passing things through to another layer. Although there may be quite a bit of code it mostly disappears when the code is instantiated and inlined. Note inlining [and doing some optimizations] can easily result in bigger code, though. Note that inlining small functions often results in smaller (and faster) code (basically because the otherwise necessary calling sequence often requires more instructions than what is inlined and the optimizer gets a better chance to further reduce the code by a more holistic view of what's going on).
Where template code isn't inlined, duplicate instantiations in different translation units need to be merged into just one instantiation. I'm not a linker expert but my understanding is that, e.g., ELF uses different sections and the linker can choose to include only those sections which are actually used.
In bigger executables you'll need some vocabulary types and instantiations which used in many places and effectively shared. Doing everything using a custom type would be bad idea and type erasure is certainly an important tool to avoid too many types.

That said, where possible it does pay off to preinstantiate templates, especially if there are only a small number of instantations which are generally used. A great example is the IOStreams library which is unlikely to be used with more than 4 types (typically it is used with just one): moving the template definitions and their instantiations into separate translation units may not reduce the executable size but will certainly reduce the compile time! Starting with C++11 it is possible to declare template instantiations as extern which allows the definitions to be visible without getting implicitly instantiated on specializations which are known to be instantiated elsewhere.

also I think that the OP is starting from the ( wrong ) idea that templates = functions, but this is clearly wrong, templates in C++ are an example of metaprogramming, it's a technology that generates software inside your software, a function is a more imperative and less abstract concept. there are also different mechanisms involved, for example too many times a partial specialization for a template it's called "overloading", and this is something that could lead someone to think that this works just as function overloading at runtime. — user2485710, Dec 11 '13 at 20:59

score 3 · Answer 3 · answered Dec 11 '13 at 20:47

I think you're misunderstanding how templates are implemented. Templates are compiled on a need-to-use basis into a corresponding class/function.

Consider the following code...

template <typename Type>
Type mymax(Type a, Type b) {
    return a > b ? a : b;
}

int main(int argc, char** argv)
{
}

Compiling this, I get the following assembly.

    .file   "example.cpp"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1"
    .section    .note.GNU-stack,"",@progbits

You'll notice it only contains the main function. Now I update my code to use the template function.

int main(int argc, char** argv)
{
    mymax<double>(3,4);
}

Compiling that I get a much longer assembly output including the template function to handle doubles. The compiler saw the template function was being used by the type "double" so made a function to handle that case.

    .file   "example.cpp"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    movabsq $4616189618054758400, %rdx
    movabsq $4613937818241073152, %rax
    movq    %rdx, -24(%rbp)
    movsd   -24(%rbp), %xmm1
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    call    _Z5mymaxIdET_S0_S0_
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .section    .text._Z5mymaxIdET_S0_S0_,"axG",@progbits,_Z5mymaxIdET_S0_S0_,comdat
    .weak   _Z5mymaxIdET_S0_S0_
    .type   _Z5mymaxIdET_S0_S0_, @function
_Z5mymaxIdET_S0_S0_:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movsd   %xmm0, -8(%rbp)
    movsd   %xmm1, -16(%rbp)
    movsd   -8(%rbp), %xmm0
    ucomisd -16(%rbp), %xmm0
    jbe .L9
    movq    -8(%rbp), %rax
    jmp .L6
.L9:
    movq    -16(%rbp), %rax
.L6:
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   _Z5mymaxIdET_S0_S0_, .-_Z5mymaxIdET_S0_S0_
    .ident  "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1"
    .section    .note.GNU-stack,"",@progbits

Now let's say I change the code to use that function twice.

int main(int argc, char** argv)
{
    mymax<double>(3,4);
    mymax<double>(4,5);

}

Again, let's look at the assembly it creates. It's comparable to the previous output because most of that code was just the compiler creating the function mymax where "Type" is changed to a double. No matter how many times I use that function, it will only be declared once.

    .file   "example.cpp"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    movabsq $4616189618054758400, %rdx
    movabsq $4613937818241073152, %rax
    movq    %rdx, -24(%rbp)
    movsd   -24(%rbp), %xmm1
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    call    _Z5mymaxIdET_S0_S0_
    movabsq $4617315517961601024, %rdx
    movabsq $4616189618054758400, %rax
    movq    %rdx, -24(%rbp)
    movsd   -24(%rbp), %xmm1
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    call    _Z5mymaxIdET_S0_S0_
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .section    .text._Z5mymaxIdET_S0_S0_,"axG",@progbits,_Z5mymaxIdET_S0_S0_,comdat
    .weak   _Z5mymaxIdET_S0_S0_
    .type   _Z5mymaxIdET_S0_S0_, @function
_Z5mymaxIdET_S0_S0_:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movsd   %xmm0, -8(%rbp)
    movsd   %xmm1, -16(%rbp)
    movsd   -8(%rbp), %xmm0
    ucomisd -16(%rbp), %xmm0
    jbe .L9
    movq    -8(%rbp), %rax
    jmp .L6
.L9:
    movq    -16(%rbp), %rax
.L6:
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   _Z5mymaxIdET_S0_S0_, .-_Z5mymaxIdET_S0_S0_
    .ident  "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1"
    .section    .note.GNU-stack,"",@progbits

So basically templates don't affect the exec size any more than writing the functions by hand. It's just a convenience. The compiler will create a function for one or more uses of a given type so if I use it 1 or 1000 times, there will only be one instance of it. Now if I update my code to also handle a new type like floats, I'll get another function in my executable, but only one no matter how many times I use that function.

He said "each **new** type instantiation of a template?". mymax(3,4); would increase the size of the executable. — Mustafa Ozturk, Dec 11 '13 at 20:51
He knows that. The answer he linked to says that, too. I'm just elaborating on that answer showing that template functions are added to the executable on a needed basis and multiple "instantiations" using the nomenclature of the linked question, doesn't increase executable size much because multiple instantiations will use the same function. — voodoogiant, Dec 11 '13 at 23:26

How do C++ compilers optimize template code?

3 Answers3