I'm using PAPI to collect performance and energy information on a skylake processor. The target application is multi-threaded, and I want to aggregate statistics across all running threads. This works fine if I only track non-RAPL events, but if I try to track RAPL and CPU counters, the CPU counter are not aggregated (i.e., they only correspond to one thread.
Everything seems to be working properly. I check all the error codes for PAPI calls, and everything is PAPI_OK
.
I apply to PAPI_INHERIT_ALL
to my event set for component 0 (CPU). Doing the same for the RAPL component fails, so I don't do that.
The output below is for two runs of my test program. The only difference is that the second run includes rapl:::PACKAGE_ENERGY:PACKAGE0
. Without the RAPL event, the cycle and instruction count scale with thread count. With it, they don't (but the energy counters show that package energy is scaling consistently).
I'm running under papi-5.7.0.
uname -a
:
Linux 80b3989af663 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Without a RAPL event
WallTime | threads | PAPI_TOT_CYC | PAPI_TOT_INS
-------------------------------------------------
2.0 | 1.0 | 2.997G | 12.945G
2.0 | 2.0 | 5.995G | 25.888G
2.0 | 3.0 | 8.992G | 38.835G
2.0 | 4.0 | 11.989G | 51.778G
With RAPL a event
WallTime | threads | PAPI_TOT_CYC | PAPI_TOT_INS | rapl:::PACKAGE_ENERGY:PACKAGE0
-----------------------------------------------------------------------------------
1.999 | 1.0 | 2.997G | 12.944G | 10.643G
2.0 | 2.0 | 2.997G | 12.945G | 12.896G
2.0 | 3.0 | 2.997G | 12.92G | 16.109G
2.0 | 4.0 | 2.997G | 12.946G | 19.471G