0

On a cluster I am working on there is a node which is showing high CPU temperature.

The node has 2 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz.

The sensors command from lm-sensors is showing that one CPU is at around 70°C and the other at 90°C. The load is 100%. It is in fact overloaded but the load can not be reduced. The temperature is highly correlated with the load. The current frequency is higher than the max frequency. max : 2400000 cur: 5280000 So I do not think that there is throttling.

Is the temperature diffrence a sign of cooling issues ?

The intel documentation is showing that the temperature case is 86°C from what I understand it means that the lifespan of the CPU at 90°C will decrease.

It is almost a week with these temperatures, should I look into solution (reduce CPU speed) to decrease the temperature of the CPU ? The node will probably run other intensive CPU jobs in the future.

Thomas
  • 113
  • 4

1 Answers1

0

Running a CPU at those temperatures is within specs, but it will most likely degrade the longevity of your components. You should definitely look into scaling up, both horizontally and vertically, to reduce the load. If on-premise you could also check if there are more efficient cooling options.

  • Do you have any idea of why there is a 20°C gap between CPUs ? – Thomas Nov 03 '21 at 08:27
  • There could be several reasons for that. It could be that the internal load balancing favors one CPU over another (need to verify this), the other could be that the thermal paste on the hotter CPU wasn't applied correctly or has degraded over time. – Erik Norman Nov 03 '21 at 13:26
  • If you find the reason, I'd be curious to know what it was. – Erik Norman Nov 04 '21 at 07:47
  • In fact I already checked if the load was the same on the CPUs and it is the same. So I was already thinking about the thermal paste. I will try to apply new thermal paste on the hotter CPU. – Thomas Nov 04 '21 at 10:13
  • I use Conductonaut by thermal grizzly (unless you have an aluminimium contact area between CPU and heatsink). It really makes a difference. – Erik Norman Nov 04 '21 at 13:03