9

I discovered that compile time of a relatively small amount of code, converting lambda functions to std::function<> values, can be very high, in particular with Clang compiler.

Consider the following dummy code that creates 100 lambda functions:

#if MODE==1
#include <functional>
using LambdaType = std::function<int()>;
#elif MODE==2
using LambdaType = int(*)();
#elif MODE==3
#include "function.h" // https://github.com/skarupke/std_function
using LambdaType = func::function<int()>;
#endif

static int total=0;

void add(LambdaType lambda)
{
    total += lambda();
}

int main(int argc, const char* argv[])
{
    add([]{ return 1; });
    add([]{ return 2; });
    add([]{ return 3; });
    // 96 more such lines...
    add([]{ return 100; });

    return total == 5050 ? 0 : 1;
}

Depending on MODE preprocessor macro, that code can select between the following three ways to pass by a lambda function to add function:

  1. std::function<> template class
  2. a simple C pointer to function (possible here only because there is no capture)
  3. a fast replacement to std::function written by Malte Skarupke (https://probablydance.com/2013/01/13/a-faster-implementation-of-stdfunction/)

Whatever the mode, the program always exit with a regular 0 error code. But now look at compilation time with Clang:

$ time clang++ -c -std=c++11 -DMODE=1 lambdas.cpp 
real    0m8.162s
user    0m7.828s
sys 0m0.318s

$ time clang++ -c -std=c++11 -DMODE=2 lambdas.cpp 
real    0m0.109s
user    0m0.056s
sys 0m0.046s

$ time clang++ -c -std=c++11 -DMODE=3 lambdas.cpp 
real    0m0.870s
user    0m0.814s
sys 0m0.051s

$ clang++ --version
Apple LLVM version 10.0.0 (clang-1000.11.45.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Whow. There is a 80 times compile time difference between std::function and pointer to function modes ! And even a 10 times difference between std::function and its replacement.

How can it be? Is there a performance problem specific to Clang or is it due to the inherent complexity of std::function requirement?

I tried to compile the same code with GCC 5.4 and Visual Studio 2015. There are also big compile time differences, but not as much.

GCC:

$ time g++ -c -std=c++11 -DMODE=1 lambdas.cpp 
real    0m1.179s
user    0m1.080s
sys 0m0.092s

$ time g++ -c -std=c++11 -DMODE=2 lambdas.cpp 
real    0m0.136s
user    0m0.120s
sys 0m0.012s

$ time g++ -c -std=c++11 -DMODE=3 lambdas.cpp 
real    0m1.994s
user    0m1.792s
sys 0m0.196s

Visual Studio:

C:\>ptime cl /c /DMODE=1 /EHsc /nologo lambdas.cpp
Execution time: 2.411 s

C:\>ptime cl /c /DMODE=2 /EHsc /nologo lambdas.cpp
Execution time: 0.270 s

C:\>ptime cl /c /DMODE=3 /EHsc /nologo lambdas.cpp
Execution time: 1.122 s

I am now considering using Malte Skarupke's implementation, both for a slight better runtime performance and for a big compile time enhancement.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
prapin
  • 6,395
  • 5
  • 26
  • 44
  • Did you take a look at the generated assembly? Maybe there is some fancy optimization going on. – pschill Sep 25 '18 at 16:42
  • 1
    Fun fact: With `-O3` gcc optimizes the function pointer version to 37 lines of assembly (on my machine). I think it can compute the result 5050 at compile time. For comparison: For `std::function` gcc generates about 8000 lines of assembly. There seems to be a lot of complexity involved when using `std::function`, which would explain the compile time. – pschill Sep 25 '18 at 17:37
  • @pschill Interesting observation, GCC can indeed make great optimization in some cases! I deliberately measure compile time without optimization. And no, I didn't think to look at generated code for comparison. – prapin Sep 26 '18 at 17:55

3 Answers3

0

Have a look at what the compiler has to process in each case with the --save-temps option. On my machine with clang 6.0.1, MODE=1 generates a 575K preprocessed file, due to the multitude of standard library headers being included. The MODE=1 generates a 416 byte file, 1000 times smaller. The generated assembly is also different by a factor of 10.

NeroP
  • 11
  • 1
0

I don't have the ability to test and interpret the example you have, however, from Clang 9.0.0 on, it has the ability to make a time trace of your compilation. See phoronix article for an impression and links to more info. In short, you can get a json of what it's doing that you can visualize in a nice graphic by adding -ftime-trace to the command line.

If you notice something really strange, you can always log a bug at bugs.llvm.org with a good reproduction (I think changing some wording of this question would be fine)

Let me also add a small comment about the testing code. I'm not surprised that the std:: function is slower to compile, as this requires an extra include to parse. (And standard library includes are huge) Also for the run-time, the slow effect is logical as std:: function is adding a lot of extra indirection which can't be optimized away.

I would recommend adding a 4th year case where add is a template and the function type the template argument:

template<typename LambdaType>
void add(LambdaType &&lambda)
{
    total += lambda();
}
JVApen
  • 11,008
  • 5
  • 31
  • 67
0

I also encountered something similar, but related to RAM usage:

I have an RTTI library, that wraps a lot of type related functions (like constructors and destructors) in lambdas, and stores them in std::functions. Since the reflector instantiates for each type in use, it had an enormous RAM footprint (required about 80 GB of memory to just compile).

After a lot of pondering and searching for self-inflicted pathologic metaprogramming, I narrowed the problem down to std::function, and was able to lower RAM usage from 80 GB to 4 GB, just by using raw function pointers and the + lambda trick.

The RAM overhead seemed to be consistent across all compilers I currently use:

  • Clang 15.0.1 (MSVC's clang-cl variant)
  • MSVC 19.35.32215.0

My guess is that there's some fundamental metaprogramming booboo involved with all std::function implementations.

Dimo Markov
  • 422
  • 2
  • 9