1

When I was trying to understand the cache-miss event of perf on Intel machines, I noticed the following description:

"PublicDescription": "Counts core-originated cacheable requests that miss the L3 cache (Longest Latency cache). Requests include data and code reads, Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches from L1 and L2. It does not include all misses to the L3.",

https://elixir.bootlin.com/linux/v4.18/source/tools/perf/pmu-events/arch/x86/skylake/cache.json#L163

I have the following questions:

  1. Since it does not include all L3 misses, what other actions can cause L3 misses?
  2. Besides cores, who else would originate L3 cache access requests? It should not be hardware prefetching, as hardware prefetching is already included in this metric.
  3. What about uncacheable requests? Would they cause L3 misses?

Supplementary Questions on Cache Consistency Protocol :

  1. Would it be considered as a "cache-miss" if a load operation cause a miss in local L3, but a hit in remote L3 (in multi-socket machine) with target data in F-state of MESIF protocol?
  2. What if data in remote L3 is in M-state and require write back to DRAM before respond to the cache request (assuming we are running MESIF)?
  3. What if we are running MOESI on AMD, which don't require a flush to DRAM before sending the modified data?
  • 1
    2. - On a Xeon at least, DMA can go directly into L3 cache. The same text might appear for Skylake-server, not just client CPUs. 3. - I wouldn't expect NT loads/stores (or normal loads/stores to UC or WC mappings) could cause L3 misses. They evict data if it's present, so I guess they do have to probe the cache. So a not-present result is a good thing, it means no dirty-write-back or eviction is necessary. You wouldn't normally call that a "miss", more like an invalidate message. – Peter Cordes Apr 12 '23 at 06:42
  • @PeterCordes Thank you very much for your answer. Would it be considered as a "cache-miss" if a load operation cause a miss in local L3, but a hit in remote L3 (in multi-socket machine) with target data in F-state of MESIF protocol? What if data in remote L3 is in M-state and require write back to DRAM before respond to the cache request (assuming we are running MESIF)? And what if we are running MOSIF on AMD? – Frontier_Setter Apr 12 '23 at 08:28
  • I made a typo, not "MOSIF", but "MOESI" of AMD cpu. – Frontier_Setter Apr 12 '23 at 08:40
  • Good question, I don't know. You might be able to investigate if you can create a situation where you get a lot of remote hits. Perhaps there are offcore events that can help check whether you're actually getting the hits/misses you expect, e.g. high counts for `offcore_response.demand_data_rd.l3_miss_local_dram.any_snoop` might tell you that you aren't hitting in cache after all, and that the page is actually local not remote. (IDK, haven't used it.) As for AMD, the microarchitectural event that `perf` maps `cache-misses` to would have its own rules that depend on the AMD microarchitecture. – Peter Cordes Apr 12 '23 at 08:41

0 Answers0