How to capture an accurate value of a retired instruction event using Hardware Performance Counter?

Question

The accuracy of hardware performance counters measurements is discussed widely in the literature. Using hardware performance counters, we can measure many types of micro-architecture events, such as cache hits and misses, load and store, and retired instructions. However, These measurements are still in question, how much are they accurate? As illustrated in many papers, with different device, results can vary. Some of these events, like store instructions, are deterministic events, i.e. the measurement results of store events do not change with re-execution of the program and capture the performance counter values. Retired instruction is not. Meaning that, if we measure a part of code, like a loop statement, we may get a different counter value from run to another one. In [this article], the author wrote:

"When deterministic counters do become available, they will be welcomed not only by those working on deterministic replay and simulator validators, but also by all users of performance counters."

By the way, can we use deterministic events like store events in tandem with retired instruction to introduce a deterministic user-defined event?

Any help will be appreciated

In practice on Intel hardware, the `instructions`event is highly repeatable (like `uops_executed.thread` and `uops_issued.any` if branch-misses are negligible) , to within 1 part per 10k using `perf` for a whole process that runs only 500M instructions in ~300M cycles (i.e. quite short). Including startup overhead and I think also interrupt handlers since I'm doing this in a regular user-space Linux process. Are you trying to write a cycle-accurate simulator? Is there some reason the regular `instructions` event isn't sufficient for what you're doing? — Peter Cordes, Aug 22 '18 at 11:21
@PeterCordes thank you, I hope to use hardware counters in order to calculate the number of executed instruction with a basic block of program in order to detect any violation to the correct behavior. Therefore, I need a way to capture value of counters accurately. If a basic block has 10 instructions and due to a perturbation in hardware counters, the captured value may be 11, the CFC method (Control Flow Checking) will report this as an error. — husin alhaj ahmade, Aug 22 '18 at 11:47
The best way to do that is to instrument the code either statically or dynamically at the basic block level. This incurs higher performance overhead, but it's accurate. I cannot think of an easy way using the PMU events. — Hadi Brais, Aug 22 '18 at 22:17
I think Hardware Performance Counter accuracy depends on a lot of factors influencing the measurement. In general, I would say that it is very hard (may be impossible) to measure the retired instructions with accuracy close to 100%. — husin alhaj ahmade, Aug 23 '18 at 07:09

How to capture an accurate value of a retired instruction event using Hardware Performance Counter?

0 Answers0