2

I'm trying to use Intel Performance Counter Monitor (PCM) to understand L3 cache miss and some other performance criteria in my code.

I'm not sure how to make sense out of the numbers I'm getting and would appreciate some insight.

I expect ideally to get 0 bytes read from the following piece of code however I'm getting a number that is close to 240KB read. If I try to run other processes, the 240KB number fluctuates (doesn't monotonically go up/down, it just has meaningful fluctuation (goes up first and then goes down) ).

    volatile SystemCounterState before_sstate = getSystemCounterState();
    volatile SystemCounterState after_sstate = getSystemCounterState();

    cout << "Instructions per clock: "  << getIPC( before_sstate, after_sstate )
         << ", L3 Cache hit ratio: "    << getL3CacheHitRatio( before_sstate, after_sstate )
         << ", L3 Missed Cycles: "         << getCyclesLostDueL3CacheMisses(before_sstate, after_sstate )
         << ", Bytes read: "            << getBytesReadFromMC( before_sstate, after_sstate )
         << ", L3 Occupancy: "          << getL3CacheOccupancy( after_sstate ) << endl;

Here is the output I'm getting:

Trying to use Linux perf events...

Successfully programmed on-core PMU using Linux perf

Instructions per clock: 0.637448, L3 Cache hit ratio: 0.820139, Missed Cycles: 0.075492, Bytes read: 263488, L3 Occupancy: 0

Does anyone know why I'm getting 240KB read although I'm actually not reading anything in the code? Is it sharing the compute resources with other processes and potentially capturing the stats from other processes as well? If that's the case, how can I make sure that the information captured is isolated to this code/process running?

Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
Amir
  • 421
  • 1
  • 4
  • 14
  • By looking at the [implementation](https://github.com/opcm/pcm/blob/0e9461a962382e55d1644f3f4320092a32fd29bd/cpucounters.cpp#L3701) of `getSystemCounterState`, we can see that it does accumulate the event counts over all cores, which is apparently not what you want. – Hadi Brais Mar 02 '19 at 20:52
  • Thanks for that info. You're right sounds like the results are aggregated for all the processors in the system. Is there an API inside PCM to tell the ID of the core that is running this PCM code? – Amir Mar 04 '19 at 22:07
  • The two solutions coming to my mind are: 1) Boot Linux in Single Processor mode. 2) Using smp_processor_id(). Do you have a better approach? – Amir Mar 04 '19 at 22:08
  • Ok. So I offlined all the cores except for one (approach 1), and I'm getting this error: "PCM does not support using Linux perf API on systems with offlined cores. Falling-back to direct PMU programming." Any ideas? – Amir Mar 04 '19 at 23:45

0 Answers0