0

I am trying to understand meaning of various Intel performance monitoring counters and also want to measure load stalls using Intel performance monitoring counters available for RESOURCE_STALLS.

The following are approx. per second values for all RESOURCE_STALLS counters for a program running on my system (i.e INTEL_BROADWELL_XEON)

RESOURCE_STALLS.ANY = 522266857
RESOURCE_STALLS.SB  = 249785706
RESOURCE_STALLS.ROB  = 78120602
RESOURCE_STALLS.RS   = 53729085

Questions:

Does RESOURCE_STALLS.SB count store stall cycles?

How to find load stalls?

Can we subtract sum of RESOURCE_STALLS.ROB, RESOURCE_STALLS.SB and RESOURCE_STALLS.RS from RESOURCE_STALLS.ANY to get approximate cycles spent in load stalls?

Thanks,
TS

TS_6607
  • 31
  • 2
  • 1
    Stores being slow to commit is only a problem because the store buffer can fill up. Loads being slow to return data is a big problem because later uops usually depend on their results. (So you get the RS filling up, and sometimes the ROB.) The other thing that makes stores special and more worth tracking separately is that stores live in the store buffer *after* retiring from the ROB (such stores are called "graduated", and only at that point can they be considered for commit to L1d). – Peter Cordes Oct 12 '22 at 08:14
  • 1
    I highly doubt you can subtract the sum of other stalls from ANY and get something meaningful. The ROB and/or RS can be full at the same time as the store buffer, and loads not being able to issue from the front-end because you're out of load-buffer entries could happen without the CPU being stalled (which IIRC means no uops dispatched for execution that cycle.) – Peter Cordes Oct 12 '22 at 08:19

0 Answers0