1

I have some non-trivial C++17 functions marked as constexpr. They are doing graph-related computation (depth-first traversal) and general algorithms (e.g., find, sort, unique...).

If I try to force evaluation at compile-time by putting the result in a constexpr global variable, 3 things can happen:

  1. For small-size computation (to give an idea, lets say graphs of ~100 nodes, nodes being more or less integers) the compilation is fine (takes ~2s)
  2. With ~500 nodes, the compilation takes ~1min and takes 30GB of memory (!).
  3. With ~1000 nodes, the compilation requires too much memory for me to let it finish.

If I remove the constexpr qualifiers and ask for a run-time computation, compilation and execution are very fast (less than 5s)

I used g++ 8.2 with -O3 -std=c++17.

Why is it taking so long ? Is g++ known for compile-time optimization problems of constexpr ? What performance should I expect from constexpr functions during compilation? From what I understand, the compiler turns itself into an interpreter for constexpr computations. But I have absolutely no doubt that evaluating the same program in Python would be very fast, given the very small size of the data.

Edit: Such problems are mentionned here (Blog of a GCC developer)

Bérenger
  • 2,678
  • 2
  • 21
  • 42
  • 2
    That is what it is. If you are not satisfied with your compiler performance, you can return it for a full refund. But keep in mind that properly handling constexpr function is very very very non-trivial task. – SergeyA Mar 04 '19 at 19:42
  • @SergeyA So you would not argue the opinion that constexpr computations are only suitable for trivial tasks? That's what it seems but its not how I've been introduced to constexpr... – Bérenger Mar 04 '19 at 19:54
  • @Bérenger constrexpr evaluation really is a difficult task, the compiler has to check and prohibit any undefined behavior, and has to output a diagnostic. – Guillaume Racicot Mar 04 '19 at 19:58
  • @GuillaumeRacicot What surprises me is the discrepancy between what I observe and the claims that have been made about constexpr. For instance, are there documents about people reporting how slow can constexpr computation be ? – Bérenger Mar 04 '19 at 20:04
  • 1
    @Bérenger I believe there are some documented case where the usage of memory (or computation time) exceed the expected, but these are specific to some compiler version / reported issues. I would suggest to you to try and experiment with different compilers to see if someone managed to make your case faster or ligher, and provide detailed report to your implementation. – Guillaume Racicot Mar 04 '19 at 20:12
  • Compilers are general purpose tools for interpreting and transforming complex data structures. A program output by the compiler will be optimised for your specific task. The compiler itself cannot be, since it must work on a wide range of code. The value of compile-time evaluation is the compiler doing the calculations once, rather than your program repeating those calcuations every time it is run (many many times over). It is quite reasonable for compile-time calculations to be many orders of magnitude less efficient than executable code specifically optimised for those particular calculation – Peter Mar 04 '19 at 20:19

1 Answers1

4

g++ memoizes compile time structures. What more, compile time structures may be created and inspected along both the branch you want to take, and the one you do not, unless you are careful.

Exponential blowup is quite possible, and may be what you are seeing.

There are strategies to reduce compile time complexity. Avoid deep recursion. Pay attention to accumulated symbol length. Ensure that only the branches you want to take have to be examined.

I mean, examine a really simple:

std::conditional_t< (A<B), make_type_A<Ts...>, make_type_B<Ts...> >

The writer of this code probably intended to make only one type, but this code requires that both types be created.

This is unlikely to be your problem, but a similar problem could occur when running constexpr code.

For each call, work out the size of the state required. Add up the total state needed. Throw in a 10x overhead.

You can also analyze what the O-notation of your problem is by having more samples than 2 completed. Check 100, 200, 300, 400, 500 size graphs. Try linear graphs, trivial graphs, complete graphs, random graphs with constant or percentage connectivity.

The O-notation of the growth in compile time might help you narrow down where the problem is. If it is linear, polynomial or exponential you are going to be looking at different kinds of problems.

Linear with a sharp inflection means you are hitting a resource bottleneck. Maybe memory. Start graphing other resource usage and see if you can find the bottleneck.

Exponential looks a lot like linear with a cliff if you don't log-graph it and zoom in on the "cliff". There may be a narrow portion where the exponential part leaves the constant factor behind.

Polynomial gets interesting. The order of the polynomial (log graph can help find that) can tell what kind of operation is screwing you over. Much like knowing your conventional algorithm is O(n^3) means you are looking for a triple-loop. An O(n^3) compile time means you are somehow instantiating the equivalent of a triple-loop.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524