My GCC version is 12.2.0. It has a flag -fconstexpr-cache-depth
.
-fconstexpr-cache-depth=n
Set the maximum level of nested evaluation depth for C++11 constexpr functions that will
be cached to n. This is a heuristic that trades off compilation speed (when the cache
avoids repeated calculations) against memory consumption (when the cache grows very large
from highly recursive evaluations). The default is 8. Very few users are likely to want
to adjust it, but if your code does heavy constexpr calculations you might want to
experiment to find which value works best for you.
I'm curious that when will this flag takes effect. So I borrowed a code piece from this answer. Here is the code.
#include <array>
#include <vector>
#include <algorithm>
constexpr auto primes_num_vector = [] {
constexpr int N = 1 << 16;
std::vector<int> ret;
bool not_prime[N] = {};
for (int i = 2; i < N; i++) {
if (!not_prime[i]) {
ret.push_back(i);
for (int j = 2 * i; j < N; j += i) not_prime[j] = true;
}
}
return ret;
};
constexpr auto primes = [] {
std::array<int, primes_num_vector().size()> ret;
std::ranges::copy(primes_num_vector(), ret.begin()); // comment out this line in the second bench
return ret;
}();
Since this constexpr
function costs a noticeable time to compute, I can determine whether or not the result is cached by measuring the compilation time. Here is the command used for measuring.
$ hyperfine 'g++ -std=c++20 -Wall -Wextra -pedantic test.cpp -o /tmp/bin/test'
This is the result of the original code:
Time (mean ± σ): 1.365 s ± 0.030 s [User: 1.317 s, System: 0.045 s]
Range (min … max): 1.336 s … 1.429 s 10 runs
This is the result after I commented out the ranges::copy
line:
Time (mean ± σ): 768.4 ms ± 15.4 ms [User: 729.7 ms, System: 37.6 ms]
Range (min … max): 752.9 ms … 793.7 ms 10 runs
Obviously the result is not cached.
I suspected that maybe GCC simply forgot to handle constexpr
lambdas. So I changed primes_num_vector
to a normal named function like below and benchmarked it again.
constexpr auto primes_num_vector() {
// same as before
}
It turned out that the benchmark results stayed the same.
In what case can GCC cache this constexpr
function?