Measure dirty evicts of Caches using Linux Perf or other Perf monitors

Question

Linux perf tool provides data on cache behavior for some events like "LLC-load-misses" that counts reads that missed LLC and "LLC-store-misses" counts write misses. Now, what exactly is a write miss here? Typically, since a write miss would create a read request to the next level memory in modern write-allocate caches, so are these write miss counts from the above counter (LLC-store-misses) actually represents the read requests generated due to write miss?
And I wonder if there is any perf event that tracks dirty evicts too or they get tracked internally in above mentioned events or similar.

"LLC-store-misses" is a generic event that has to pick some actual hardware event depending on what HW you're actually running on. On my Skylake system, it seems the count is always the same as `offcore_response.demand_rfo.l3_miss.any_snoop`, which you could try to look up something about in the Intel manual. (So it's about counting cache lines, not counting individual stores!) That event was just a guess out of what `perf list` showed, based on being a "client" chip that didn't have to snoop any other sockets, and I'm not sure exactly what it means :P — Peter Cordes, Mar 17 '22 at 16:04
For LLC evictions, perhaps uncore events would be relevant, but IDK which. (Assuming you're talking about Intel CPUs, but you didn't say.) — Peter Cordes, Mar 17 '22 at 16:07
Thanks for your response. Yes, I'm on a Skylake system and the same event `offcore_response.demand_rfo.l3_miss.any_snoop` produces similar numbers for me too but the description says - "counts all demand data writes (RFOs) that miss in the L3". Now, I'm not sure if it represents **"write requests that miss L3 and as a result it is read from memory back to child caches and modified at L1"** or **"dirty evicts from L3 cache"**. But since the name says "demand_rfo", I want to assume it is the former. — ShAd, Mar 17 '22 at 18:32
I'm pretty sure it's not at all counting dirty evicts. A clean evict to make room for the RFO data would still count, I think. If there was an invalid line in L3, it might not even need to evict anything to make room. Also remember that this is counting per line, not per store instruction. Two nearby stores will end up using the same LFB to wait for the RFO to finish. For per-instruction counts, you'd want `mem_inst_retired` events, but there aren't ones that break down stores by hit/miss. (Probably because they haven't even tried to commit to L1d yet when they retire.) — Peter Cordes, Mar 17 '22 at 18:37
I couldn't find any uncore event that specifically addresses evicts or dirty evicts of any cache — ShAd, Mar 17 '22 at 18:38
Only the shared L3 is part of the uncore (and optional L4 on Iris graphics parts); L1i/d and L2 are part of each core. Dirty write-back from L2 is `l2_lines_out.non_silent` — Peter Cordes, Mar 17 '22 at 18:39
oh sorry, you are correct, I forgot about `l2_lines_out.non_silent` event. I think I want to look at LLC or L3 evictions specifically. I can probably look at `llc_misses.mem_write` event in uncore memory that I guess represents L3 dirty evicts but that event is not supported in my system. — ShAd, Mar 17 '22 at 18:43
`intel_gpu_top -l` will show you the integrated memory controller's (IMC's) current read/write bandwidth, updating every second. (System-wide, not just from GPU activity despite the name of that tool). — Peter Cordes, Mar 17 '22 at 18:45
Oh, does that stop `intel_gpu_top -l` from showing you the memory controller bandwidth? I do use the iGPU on my SKL. — Peter Cordes, Mar 17 '22 at 18:49
Yeah I think so. When I run `intel_gpu_top -l`, it says `Failed to detect engines! (No such file or directory)` . I think I found an event called `PAPI_L3_DCW' in PAPI tool that may potentially measure dirty writes. I can try that and see. — ShAd, Mar 17 '22 at 18:52

Measure dirty evicts of Caches using Linux Perf or other Perf monitors

0 Answers0