I need to profile a program in development to understand what bottlenecks there may be and in particular whether there are any due to memory accesses. To do this I used cachegrind built into valgrind.
I compiled the program using gcc
and the -g
flag after which I ran valgrind using the command valgrind --tool=cachegrind ./a.out
.
The result printed on the terminal was as follows:
==11611== Cachegrind, a cache and branch-prediction profiler
==11611== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==11611== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==11611== Command: ./profiling
==11611==
--11611-- warning: L3 cache found, using its data for the LL simulation.
Elapsed computation time: 33.46223 seconds
==11611==
==11611== I refs: 10,918,854,735
==11611== I1 misses: 1,655
==11611== LLi misses: 1,646
==11611== I1 miss rate: 0.00%
==11611== LLi miss rate: 0.00%
==11611==
==11611== D refs: 4,620,671,815 (4,235,254,268 rd + 385,417,547 wr)
==11611== D1 misses: 3,222,370 ( 2,887,833 rd + 334,537 wr)
==11611== LLd misses: 18,506 ( 16,679 rd + 1,827 wr)
==11611== D1 miss rate: 0.1% ( 0.1% + 0.1% )
==11611== LLd miss rate: 0.0% ( 0.0% + 0.0% )
==11611==
==11611== LL refs: 3,224,025 ( 2,889,488 rd + 334,537 wr)
==11611== LL misses: 20,152 ( 18,325 rd + 1,827 wr)
==11611== LL miss rate: 0.0% ( 0.0% + 0.0% )
The thing I don't understand is the final percentage for LL miss rate, in fact doing LL misses/LL refs * 100 should come 0.6% while the terminal reports 0.0%. Is it an approximation done by cachegrind ?
Using kcachegrind I only get percentages next to the event types and next to the lines of code (as in figure). Is it possible to see the number of misses instead ?