Here is an example for profiling, on macos 10.15.4, using clion with https://www.jetbrains.com/help/clion/cpu-profiler.html , I found it only ouput a part of call stacks.
#include <iostream>
#include <thread>
#include <cmath>
void g() {
std::this_thread::sleep_for(std::chrono::seconds(1));
}
float l2sqr(float* x, float* y, size_t N) {
float ret = 0;
for (size_t i = 0; i<N;i++) {
ret += (x[i]-y[i])*(x[i]-y[i]);
}
return std::sqrt(ret);
}
int main() {
float x[512];
float y[512];
std::fill_n(x, 512, 0.1);
std::fill_n(y, 512, 0.2);
float s = 0.0;
for (int i = 0; i < 5; ++i) {
for (int j = 0; j < 10000; ++j) {
s += l2sqr(x, y, 512);
}
g();
std::cout << s << std::endl;
}
}
and here is the result, it did not give any sample for function g. In the real case, lots of db io time are not reported, and give me the wrong hot spot. Is it the correct behavior or am I using it a wrong way?
Using perf in ubuntu 16.04:
g++ test.cpp && sudo perf record -F 999 -g ./a.out && sudo perf report
I got a similar result:
- main
+ 99.67% l2sqr
0.08% std::sqrt