1

I am performing some thermal load tests on the Skylake processor, and am attempting to use RAPL MSRs as an early detection system for oncoming thermal spikes, instead of reading from "sensors" sysfs file.

I have several questions. Consider this as background, when I run sensors, I get the following:

acpitz-virtual-0
Adapter: Virtual device
temp1:        +43.0°C  (crit = +119.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +42.5°C  

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +43.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:         +41.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:         +41.0°C  (high = +100.0°C, crit = +100.0°C)

And when I read the RAPL MSRs, I get the following data points, as clearly described in in Intel's now deprecated page here.

Package energy: 2.493103J
PowerPlane0 (cores): 0.105652J
PowerPlane1 (on-core GPU if avail): 0.106750 J
DRAM: 0.619141J
  1. Now, I am trying to find a relationship between the energy and the temperatures. For example, which one of them is the GPU temperature? Which is DRAM? How do I know these sensor locations?
  2. Are there any MSR based ways to throttle the CPUs from user space? One easy method was to just enable /sys/devices/system/cpu/intel_pstate/no_turbo, but this does not seem to be the right thing to do. Is there any formal means to throttle the CPU/load on the system?
  3. Does RAPL also provide "power" in addition to energy? Can I deduce other details such as battery life left, based on MSR readings? Any other fancy stuff that can be done by reading and deducing from MSRs?
chimp45
  • 67
  • 7

0 Answers0