I am testing process's memory bandwidth in Cortex-A78 Linux with perf. And I got following output.
18,312,265 ll_cache_miss_rd # 0.507 M/sec (28.64%)
36,006,163 l3_cache_refill # 0.996 M/sec (42.68%)
From Cortex-A78's PMU document, it says,
LL_CACHE_MISS_RD Last level cache miss, read.
• If CPUECTLR.EXTLLC is set: This event counts any cacheable read transaction which returns a data source of 'DRAM', 'remote' or 'inter-cluster peer'.
• If CPUECTLR.EXTLLC is not set: This event is a duplicate of the L*D_CACHE_REFILL_RD event corresponding to the last level of cache implemented – L3D_CACHE_REFILL_RD if both per-core L2 and cluster L3 are implemented, L2D_CACHE_REFILL_RD if only one is implemented, or L1D_CACHE_REFILL_RD if neither is implemented.
In my Cortex-A78 system, L3 is the last level, and the CPUECTLR.EXTLLC is 0, so ll_cache_miss_rd is a duplicate of L3D_CACHE_REFILL_RD
. But it is NOT, the refill event has double number of miss_rd event!
Since the event ID of L3D_CACHE_REFILL is 0x2A, I added -e r2a
to my perf command to capture the raw event counts, the line of r2A eventhas the similar number of ll_cache_miss_rd
, which is much different with the line of l3_cache_refill
.
Did I miss anything on inteprating the events based from the document or something wrong in my testing with perf in ARMv8?