3

I am trying to read one of the Hardware counters with PAPI. When I try to read events from perf_event list, it works fine. However now I need to read one of the counters from perf_event_uncore list, which is obtained with papi_native_avail, but I get an error. It's running on cascade lake architecture, with linux 5.4.0-3-amd64 version.

int err = PAPI_event_name_to_code("skx_unc_imc0::UNC_M_WPQ_CYCLES_FULL",&native);
if (err != PAPI_OK)
    printf("PAPI_event_name_to_code error: %d\n", err);

err = PAPI_add_event(EventSet, native);

if (err!= PAPI_OK)
      printf("PAPI_add_event error: %d\n", err);

Even though PAPI_event_name_to_code returns PAPI_OK, PAPI_add_event returns -1 which is PAPI_EINVAL - Invalid argument. I tried several counters form perf_event_uncore and I got the same problem. Do I need to use different function to add this even to the eventset? or is there something else that I am doing wrong?

Ana Khorguani
  • 896
  • 4
  • 18

1 Answers1

2

I found something that seems to be a solution. After adding cpu=0 specification like this: PAPI_event_name_to_code("skx_unc_imc0::UNC_M_WPQ_CYCLES_FULL:cpu=0",&native) there is no furhter error while calling PAPI_add_event function.

Also to note, I have checked and this specific hardware counter can't be counted with others, so it should be the only event in the eventset.

Ana Khorguani
  • 896
  • 4
  • 18
  • 1
    Uncode events are global and can't be used as per-thread (hard to be rescheduled correctly). So seems like "cpu=0" switches request from per-thread counter to global (per-socket); in NUMA systems there are separate uncore counter set for every chip (socket, NUMA node). Thank you for sharing your solution. – osgx Mar 31 '20 at 10:14
  • @osgx hi, thanks a lot for the explanation. I have not thought about that to be honest, but it makes sense :) yes, even for this skx_unc_imc0-5 there are 6 counters. And each socket has 6 channels to DIMMs, so I believe each counter is for a different channel. – Ana Khorguani Mar 31 '20 at 15:48
  • There is the documentation https://www.intel.com/content/dam/www/public/us/en/documents/manuals/6th-gen-core-family-uncore-performance-monitoring-manual.pdf, Uncore PMU Counter Summary lists boxes and their counters and max number of instances (1 to 4 C-boxes - CBo). IMC is global for chip and has 5 fixed counter registers of 32 bit: "Five model specific, fixed counters that allow for monitoring the number of requests to DRAM." (3.3) "This set of counters are free-running and always-running. Software can read the value, wait for a desired internal, read again, then subtract ..." – osgx Mar 31 '20 at 20:13