1

Here is an example for profiling, on macos 10.15.4, using clion with https://www.jetbrains.com/help/clion/cpu-profiler.html , I found it only ouput a part of call stacks.

#include <iostream>
#include <thread>
#include <cmath>

void g() {
    std::this_thread::sleep_for(std::chrono::seconds(1));
}

float l2sqr(float* x, float* y, size_t N) {
    float ret = 0;
    for (size_t i = 0; i<N;i++) {
        ret += (x[i]-y[i])*(x[i]-y[i]);
    }
    return std::sqrt(ret);
}

int main() {
    float x[512];
    float y[512];
    std::fill_n(x, 512, 0.1);
    std::fill_n(y, 512, 0.2);
    float s = 0.0;
    for (int i = 0; i < 5; ++i) {
        for (int j = 0; j < 10000; ++j) {
            s += l2sqr(x, y, 512);
        }
        g();
        std::cout << s << std::endl;
    }
}

and here is the result, it did not give any sample for function g. In the real case, lots of db io time are not reported, and give me the wrong hot spot. Is it the correct behavior or am I using it a wrong way?

enter image description here

Using perf in ubuntu 16.04:

g++ test.cpp && sudo perf record -F 999 -g ./a.out && sudo perf report

I got a similar result:

   - main                                                                                                                                           
      + 99.67% l2sqr                                                                                                                                
        0.08% std::sqrt  
onriv
  • 145
  • 1
  • 7
  • Perhaps `g()` got inlined? Although it doesn't explain why `l2sqr` is 100% of the parent. – Kentzo Sep 18 '20 at 14:20
  • @Kentzo yeah I tought it's strange too. Using perf in ubuntu 1604, I got l2sqr 97.99% samples of the parent – onriv Sep 18 '20 at 16:57
  • @Kentzo: `g++` defaults to no optimization, which includes no inlining. Usually not useful or interesting to benchmark, but I guess if you want to test how perf works with function calls and sleep system calls, it does make things "simple" by making the asm be mostly a naive translation of the source. – Peter Cordes Sep 18 '20 at 17:14
  • @PeterCordes I wanna get the time usage distribution of my program (the example is simplified, in the real case it mixes man cpu oriented computation and some db access). I profiled it and the result show that the hotspot is some heavily cpu intensive function but at the end I found that the db access took the majority time consumption. – onriv Sep 18 '20 at 17:19
  • @PeterCordes I think maybe I (and clion) set a wrong configuration. There should be some configuration to sample stacks even though the process is blocked? – onriv Sep 18 '20 at 17:21
  • 1
    IIRC, the `cycles` event can only fire when code is using CPU time, not while it's sleeping (including blocked on I/O). Unless `0xd5e258d4c544155` is the address of the sleep function `g()` calls? I forget if a software event like `task-clock` can profile in wall-clock instead of CPU time, or if perf has other modes of operation. (I normally only need to profile a hot loop I'm tuning.) – Peter Cordes Sep 18 '20 at 17:25
  • @PeterCordes I see... I am quite confused by the result (0xd5e258d4c544155 is dued to some copy&pate typo maybe, I ran it again). Maybe I should just add more trace log. I come from the java programmer's world and not qutie familiar with these toolchains. – onriv Sep 18 '20 at 17:34
  • 1
    Just for comparison you may try the Instruments.app on macOS with the Time Profiler template. AFAIK it uses dtrace under the hood, except that its d script might be more sophisticated. – Kentzo Sep 18 '20 at 18:16
  • 1
    @onriv, default mode of perf does not profile for sleeping or blocked processes. There are some ways to use perf for Off-CPU analysis which profiles sleeping processes, and On-CPU and Off-CPU can be combined. Check Gregg pages: http://www.brendangregg.com/offcpuanalysis.html http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html and “Hot Cold Graph” for combined http://www.brendangregg.com/FlameGraphs/hotcoldflamegraphs.html. – osgx Sep 19 '20 at 12:21
  • @osgx thanks, these pages help me a lot! – onriv Sep 19 '20 at 15:39

0 Answers0