A slot is an execution port of the pipeline. In general in the VTune documentation, a stall could either mean "not retired" or "not dispatched for execution". In this case, it refers to the number of cycles in which zero uops were dispatched.
According to the VTune include configuration files, Memory Bound
is calculated as follows:
Memory_Bound
= Memory_Bound_Fraction
* BackendBound
Memory_Bound_Fraction
is basically the fraction of slots mentioned in the documentation. However, according to the top-down method discussed in the optimization manual, the memory bound metric is relative to the backend bound metric. So this is why it is multiplied by BackendBound
.
I'll focus on the first term of the formula, Memory_Bound_Fraction
. The formula for the second term, BackendBound
, is actually complicated.
Memory_Bound_Fraction
is calculated as follows:
Memory_Bound_Fraction
= (CYCLE_ACTIVITY.STALLS_MEM_ANY
+ RESOURCE_STALLS.SB
) * NUM_OF_PORTS
/ Backend_Bound_Cycles
* NUM_OF_PORTS
NUM_OF_PORTS
is the number of execution ports of the microarchitecture of the target CPU. This can be simplified to:
Memory_Bound_Fraction
= CYCLE_ACTIVITY.STALLS_MEM_ANY
+ RESOURCE_STALLS.SB
/ Backend_Bound_Cycles
CYCLE_ACTIVITY.STALLS_MEM_ANY
and RESOURCE_STALLS.SB
are performance events. Backend_Bound_Cycles
is calculated as follows:
Backend_Bound_Cycles
= CYCLE_ACTIVITY.STALLS_TOTAL
+ UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC
- Few_Uops_Executed_Threshold
- Frontend_RS_Empty_Cycles
+ RESOURCE_STALLS.SB
Few_Uops_Executed_Threshold
is either UOPS_EXECUTED.CYCLES_GE_2_UOP_EXEC
or UOPS_EXECUTED.CYCLES_GE_3_UOP_EXEC
depending on some other metric. Frontend_RS_Empty_Cycles
is either RS_EVENTS.EMPTY_CYCLES
or zero depending on some metric.
I realize this answer still needs a lot of additional explanation and BackendBound
needs to be expanded. But this early edit makes the answer accurate.