I'm trying to use Intel Performance Counter Monitor (PCM) to understand L3 cache miss and some other performance criteria in my code.
I'm not sure how to make sense out of the numbers I'm getting and would appreciate some insight.
I expect ideally to get 0 bytes read from the following piece of code however I'm getting a number that is close to 240KB read. If I try to run other processes, the 240KB number fluctuates (doesn't monotonically go up/down, it just has meaningful fluctuation (goes up first and then goes down) ).
volatile SystemCounterState before_sstate = getSystemCounterState();
volatile SystemCounterState after_sstate = getSystemCounterState();
cout << "Instructions per clock: " << getIPC( before_sstate, after_sstate )
<< ", L3 Cache hit ratio: " << getL3CacheHitRatio( before_sstate, after_sstate )
<< ", L3 Missed Cycles: " << getCyclesLostDueL3CacheMisses(before_sstate, after_sstate )
<< ", Bytes read: " << getBytesReadFromMC( before_sstate, after_sstate )
<< ", L3 Occupancy: " << getL3CacheOccupancy( after_sstate ) << endl;
Here is the output I'm getting:
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Instructions per clock: 0.637448, L3 Cache hit ratio: 0.820139, Missed Cycles: 0.075492, Bytes read: 263488, L3 Occupancy: 0
Does anyone know why I'm getting 240KB read although I'm actually not reading anything in the code? Is it sharing the compute resources with other processes and potentially capturing the stats from other processes as well? If that's the case, how can I make sure that the information captured is isolated to this code/process running?