On a cluster I am working on there is a node which is showing high CPU temperature.
The node has 2 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz.
The sensors command from lm-sensors is showing that one CPU is at around 70°C and the other at 90°C. The load is 100%. It is in fact overloaded but the load can not be reduced. The temperature is highly correlated with the load. The current frequency is higher than the max frequency. max : 2400000 cur: 5280000 So I do not think that there is throttling.
Is the temperature diffrence a sign of cooling issues ?
The intel documentation is showing that the temperature case is 86°C from what I understand it means that the lifespan of the CPU at 90°C will decrease.
It is almost a week with these temperatures, should I look into solution (reduce CPU speed) to decrease the temperature of the CPU ? The node will probably run other intensive CPU jobs in the future.