C++ profiling: clock cycle count

Question

I'm using valgrind --tool=callgrind to profile a critical part of my C++ program.

The part itself takes less that a microsecond to execute so I'm profiling over a large number of loops over that part.

I noticed that instructions take multiples of 0.13% time to execute (percentage out of program total time to execute). So I only see 0.13, 0.26, 0.52, so on.

My question is, should I assume that this atomic quantity measures a CPU cycle? See photo. (The callgrind output is presented graphically with kcachegrind.)

enter image description here

Edit: By the way, looking at machine code, I see mov takes 0.13 so that's probably a clock cycle indeed.

I would only expect this kind of behavior from a sampling profiler such as [perf](https://en.wikipedia.org/wiki/Perf_(Linux)). Have you tried profiling with [--dump-instr=yes](http://valgrind.org/docs/manual/cl-manual.html#opt.dump-instr) ? — xbug, Nov 26 '14 at 13:55
Just tried with --dump-instr=yes, I get the same output - multiples of 0.13% — haelix, Nov 26 '14 at 14:19

score 2 · Accepted Answer · answered Jan 08 '15 at 20:27

2

Callgrind doesn't measure CPU time. It measures instruction reads. That's where the "Ir" term comes from. If the multiples are of .13% (especially since you confirmed with mov) then it means that they are measuring a single instruction read. There are also cache simulation options that let it measure how likely you are to have cache misses.

Note that not all instructions will take the same time to execute, so the percentages do not exactly match the amount of time spent in each section. However, it still gives you a good idea of where your program is doing the most work, and likely spending the most time.

answered Jan 08 '15 at 20:27

Katie

1,260
10
20

Ok, but, would not an "instruction read" be executed in one clock cycle, normally? – haelix Jan 11 '15 at 18:59
2

There are tons of factors that can change the timing - caching is the most obvious one. And then the execution time of each instruction will vary depending on what sort of instruction it is, pipelining, branching, and other factors. There's no good way to measure this with a profiling tool because measuring it will change what the CPU is doing, and change all your measurements. That's why callgrind settles for just counting Ir's instead - no matter what the profiler is doing, it can ensure that the program logic statys the same. (Unless the program behavior is sensitive to timing changes) – Katie Jan 12 '15 at 14:48

C++ profiling: clock cycle count

1 Answers1