When I was trying to understand the cache-miss event of perf on Intel machines, I noticed the following description:
"PublicDescription": "Counts core-originated cacheable requests that miss the L3 cache (Longest Latency cache). Requests include data and code reads, Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches from L1 and L2. It does not include all misses to the L3.",
https://elixir.bootlin.com/linux/v4.18/source/tools/perf/pmu-events/arch/x86/skylake/cache.json#L163
I have the following questions:
- Since it does not include all L3 misses, what other actions can cause L3 misses?
- Besides cores, who else would originate L3 cache access requests? It should not be hardware prefetching, as hardware prefetching is already included in this metric.
- What about uncacheable requests? Would they cause L3 misses?
Supplementary Questions on Cache Consistency Protocol :
- Would it be considered as a "cache-miss" if a load operation cause a miss in local L3, but a hit in remote L3 (in multi-socket machine) with target data in F-state of MESIF protocol?
- What if data in remote L3 is in M-state and require write back to DRAM before respond to the cache request (assuming we are running MESIF)?
- What if we are running MOESI on AMD, which don't require a flush to DRAM before sending the modified data?