I have a simple template header containing 3 templated functions (no declarations, just definitions and marked static inline
),
two of these functions being 5000 lines long. These long functions are very simple, but are long because they are in strainghtline program form / no loops. On my main program file where I use an instantiation of the template, if I include the template file directly, the program runs about 10x slower than if I build a separate c++ file to include the template and instantiate it, and link to it as a static library (-fPIC
used). Why?
Is the compiler too slow, the instruction cache is getting messed up, the compiler suddenly inlined the long functions when it shouldn’t, or something else?
Code is highly optimized, being compiled with flags: -O3 -ffast-math -march=native -std=gnu++11
and GCC 5.5.0 in Mac OS 10.14.3.