I am trying to profile an application using perf and I am for now interested only in the traffic to/from DRAM. I was not able to understand from the results what is the throughput this application is getting from DRAM.
This is how I invoked the perf command:
perf stat -av -e LLC-misses,cache-misses,L1-dcache-load-misses <application>
I am using -a since this application does communicate with another daemon process which is already running.
The result I obtain is the following:
LLC-misses: 0 288628898 288606144
cache-misses: 373507 287154835 287143402
L1-dcache-load-misses: 3831372 286357135 286357135
Performance counter stats for './mclient -d tpch-sf1 /home/lottarini/Desktop/DPU/queries/tpch-monetdb/02.sql':
0 LLC-misses [99.99%]
373,507 cache-misses [100.00%]
3,831,372 L1-dcache-load-misses
0.035855129 seconds time elapsed
My understanding is that cache-misses is the number of memory references that missed throughout the whole cache hierarchy. This is consistent with the fact that I get much more L1 misses than cache-misses.
First of all why doesn't the tool output a confidence value for the L1 misses?
Why is the number of cache-misses different from the LLC-misses value? If something misses in the whole cache hierarchy it has to miss in the LLC.
Moreover, if I wanted to extract the amount of data that was being transferred due to these misses how can I compute that? Is there a perf event option that I can specify or do I need to multiply these numbers with the size of block of memory [who knows] which is transferred in case of a miss?