I perform a performance analysis of an application running on IBM POWER8 server following the CPI breakdown model for POWER8.
I understand that I need to reduce the percentage of stalls caused, for example, by cache misses (PM_CMPLU_STALL_DCACHE_MISS
) or branch mispredictions (PM_CMPLU_STALL_BRU
). The POWER7 performance analysis tutorial tells that a well-written application has a high final instruction completion percentage (PM_1PLUS_PPC_CMPL
).
Do I understand correctly that for POWER8 I need to maximize the percentage for the PM_GRP_CMPL
metric? What other PMU-based metrics should I try to maximize?