The accuracy of hardware performance counters measurements is discussed widely in the literature. Using hardware performance counters, we can measure many types of micro-architecture events, such as cache hits and misses, load and store, and retired instructions. However, These measurements are still in question, how much are they accurate? As illustrated in many papers, with different device, results can vary. Some of these events, like store instructions, are deterministic events, i.e. the measurement results of store events do not change with re-execution of the program and capture the performance counter values. Retired instruction is not. Meaning that, if we measure a part of code, like a loop statement, we may get a different counter value from run to another one. In [this article], the author wrote:
"When deterministic counters do become available, they will be welcomed not only by those working on deterministic replay and simulator validators, but also by all users of performance counters."
By the way, can we use deterministic events like store events in tandem with retired instruction to introduce a deterministic user-defined event?
Any help will be appreciated