1

I am testing process's memory bandwidth in Cortex-A78 Linux with perf. And I got following output.

       18,312,265      ll_cache_miss_rd          #   0.507 M/sec                    (28.64%)
       36,006,163      l3_cache_refill           #   0.996 M/sec                    (42.68%)

From Cortex-A78's PMU document, it says,

LL_CACHE_MISS_RD Last level cache miss, read.
• If CPUECTLR.EXTLLC is set: This event counts any cacheable read transaction which returns a data source of 'DRAM', 'remote' or 'inter-cluster peer'.
• If CPUECTLR.EXTLLC is not set: This event is a duplicate of the L*D_CACHE_REFILL_RD event corresponding to the last level of cache implemented – L3D_CACHE_REFILL_RD if both per-core L2 and cluster L3 are implemented, L2D_CACHE_REFILL_RD if only one is implemented, or L1D_CACHE_REFILL_RD if neither is implemented.

In my Cortex-A78 system, L3 is the last level, and the CPUECTLR.EXTLLC is 0, so ll_cache_miss_rd is a duplicate of L3D_CACHE_REFILL_RD. But it is NOT, the refill event has double number of miss_rd event!

Since the event ID of L3D_CACHE_REFILL is 0x2A, I added -e r2a to my perf command to capture the raw event counts, the line of r2A eventhas the similar number of ll_cache_miss_rd, which is much different with the line of l3_cache_refill.
Did I miss anything on inteprating the events based from the document or something wrong in my testing with perf in ARMv8?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
wangt13
  • 959
  • 7
  • 17

0 Answers0