4

I am programming in C++ (on Linux) and I have recently started to use Valgrind/Callgrind to optimise my code. After reading a couple tutorials it seems that focusing on functions with highest 'self' cost is a good idea.

I found two functions with high self cost (they are both called >1M times and have >10% self cost each, relatively to the entire program execution time). In kcachegrind it shows:

enter image description here

Callgrind however does not tell me which part of the function make up for that self cost, making it difficult to optimise the code. What exactly is self cost and how can I attempt to reduce it?

My understanding/guess is that self cost includes reading/writing data, cache misses, basic maths operations, copying things in stack (including function arguments), etc. How do I know which one it is before I can address it?

Thanks

The Authors
  • 131
  • 5
  • can you include the output in the question? – 463035818_is_not_an_ai Feb 05 '20 at 22:41
  • Hi @idclev463035818, what exactly would you need to see? (general info/callee map/source code?) – The Authors Feb 05 '20 at 22:55
  • I am mainly curious ;). Honestly I don't know yet if I can answer, but when I tried to search for it the first obstacle was that I didnt find anything about "self time" but only "self cost". Showing the output you are trying to interpret would help others to interpret it – 463035818_is_not_an_ai Feb 05 '20 at 22:57
  • You're right, I don't think self time is the correct terminology here. :) I replaced 'self time' with 'self' cost. I've included a small screenshot from kcachegrind. – The Authors Feb 05 '20 at 23:04
  • So out of 18% total cost for that particular function, 14% are self cost, 3-4% of which can be attributed to specific lines in the code (using the 'source code' tab in kcachegrind). How can I find out where the remaining ~10% of the self cost comes from? – The Authors Feb 06 '20 at 04:01

1 Answers1

0

There are two ways that Callgrind/Kcachegrind can represent times.

  1. % Relative. This is the default, and all times are represented as a percentage of the total time.
  2. Absolute. This is a count of the "Cycle Estimation". This is based on various "events" like instruction read, data cache miss etc. By default callgrind will only count instruction reads - you will need to add the option --cache-sim=yes for cache simulation and --branch-sim=yes for branch predictor simulation. Be aware that Valgrind only has simple cache simulation and a rudimentary branch predictor.

"Self" is the time spent in each function (not counting any child functions). "Inclusive" is the time spent in a function and all child functions that it calls, transitively.

If you want to see a breakdown of the time spent in a function, you need to compile your application with debug information. Then after running you application under Callgrind and opening the output file in Kcachegrind, you can look at the "Source Code" tab in the top right pane. This should give an indication of the time on each line of the function.

Paul Floyd
  • 5,530
  • 5
  • 29
  • 43