1

With the warmer summer days starting to hit, my server is increasingly going into thermal protection shutdown due to temp sensor #21 reaching critical level (58*C), as reported in ILO.

The fans are all running fine but unfortunately, the server is not in a climate controlled room and there isn't much I can do about the ambient temperature. However, if I know where sensor #21 is, I can try to improve the airflow around that area. And if it's an IC, I can add/beef up the heatsink to improve cooling.

Does anybody know where that sensor is?

Edit:

Levels at night time:

                 Location        Status  Reading  Thresholds  
Temp 1:          System Zone     Ok      15C      Caution: 42C; Critical:47C  
Temp 2 (CPU 1):  System 1        Ok      40C      Caution: 82C; Critical:83C  
Temp 3 (CPU 2):  System 2        Ok      40C      Caution: 82C; Critical:83C  
Temp 4:          Memory Zone     Ok      37C      Caution: 87C; Critical:92C  
Temp 5:          Memory Zone     Ok      41C      Caution: 87C; Critical:92C  
Temp 6:          System Zone     n/a     n/a      Caution: 99C; Critical:99C  
Temp 7:          System Zone     n/a     n/a      Caution: 99C; Critical:99C  
Temp 8 (MemB0):  Memory Zone     Ok      34C      Caution: 62C; Critical:67C  
Temp 9 (MemB0):  Memory Zone     Ok      34C      Caution: 61C; Critical:66C  
Temp 10 (MemB0): Memory Zone     Ok      33C      Caution: 61C; Critical:66C  
Temp 12 (MemB1): Memory Zone     Ok      38C      Caution: 66C; Critical:71C  
Temp 13 (MemB1): Memory Zone     Ok      39C      Caution: 65C; Critical:70C  
Temp 14 (MemB1): Memory Zone     Ok      38C      Caution: 70C; Critical:75C  
Temp 15:         System Zone     Ok      40C      Caution: 57C; Critical:62C  
Temp 16:         System Zone     Ok      33C      Caution: 50C; Critical:55C  
Temp 17:         System Zone     Ok      35C      Caution: 58C; Critical:63C  
Temp 18:         System Zone     Ok      45C      Caution: 110C; Critical:115C  
Temp 19:         System Zone     Ok      41C      Caution: 57C; Critical:62C  
Temp 20:         System Zone     Ok      42C      Caution: 53C; Critical:58C  
Temp 21:         System Zone     Ok      49C      Caution: 53C; Critical:58C  
Temp 22 (PCIR):  I/O Board Zone  n/a     n/a      Caution: 99C; Critical:99C  
Temp 23 (PCIR):  I/O Board Zone  n/a     n/a      Caution: 99C; Critical:99C  
Temp 24 (PCIR):  I/O Board Zone  n/a     n/a      Caution: 99C; Critical:99C  
Temp 25 (PCIR):  I/O Board Zone  n/a     n/a      Caution: 99C; Critical:99C  
Temp 26:         Storage Zone    Ok      0C       Caution: 99C; Critical:99C  
Temp 27:         Storage Zone    Ok      0C       Caution: 99C; Critical:99C  
Temp 28:         Storage Zone    Ok      0C       Caution: 99C; Critical:99C  
Temp 29:         Storage Zone    Ok      0C       Caution: 99C; Critical:99C  
Temp 30:         Storage Zone    Ok      0C       Caution: 99C; Critical:99C  
Temp 31:         Storage Zone    Ok      0C       Caution: 99C; Critical:99C  

Many thanks in advance.

Seb Boulet
  • 237
  • 3
  • 10
  • Are you sure you have a DL180 G6? Our sensor count is different and the thresholds are not the same. What is the disk layout and RAID controller setup in this server? – ewwhite Jun 16 '14 at 11:39
  • It's a DL180 G6 SE326M1 with 25xSSF disks and a P812 – Seb Boulet Jun 16 '14 at 13:04
  • A P812 in that server doesn't quite make sense unless you're running external storage as well. I'd be using a P410. In either case, it's still a supported configuration. I'd ensure that your firmware is up-to-date on the system BIOS and ILO. If no change, I'd call HP. – ewwhite Jun 16 '14 at 13:27
  • Thanks for your input. Yes, the P812 is driving an MSA60 with 3TB SATA drives and the internal SFFs. I last ran HP SUM in March but I'll give it another go. – Seb Boulet Jun 16 '14 at 13:54
  • That's pretty recent. Call HP or see if you have a BIOS option for "increased cooling" because of the P812's presence. Can you also output `hplog -f` ? – ewwhite Jun 16 '14 at 14:34

1 Answers1

2

Does it matter which sensor #21 is? What would you actually do about it?

Can you check your ambient temperature? What can you control about your environment to keep that within a reasonable range? Are you absolutely sure you don't have a failed fan?

--edit--

It makes sense to ensure the firmware of ALL of your components is up-to-date. For you, that means your system BIOS, ILO, RAID controller, NIC, backplane and disks. These can all be covered by the HP Support Pack for ProLiant bootable DVD. Please download and run.

Check the internal health LED on the server. If you're running a supported version of Linux or Windows, install the HP management agents and check the output of hplog -t to get temperature sensor information. The output of a standard DL180 G6 config looks like the following. Correlate your results with mine:

note: there are four fans in this system

# hplog -f
ID     TYPE        LOCATION      STATUS  REDUNDANT FAN SPEED
 1  Var. Speed   System Board    Normal     N/A     Normal   ( 55)
 2  Var. Speed   System Board    Normal     N/A     Normal   ( 59)
 3  Var. Speed   System Board    Normal     N/A     Normal   ( 63)
 4  Var. Speed   System Board    Normal     N/A     Normal   ( 53)

# hplog -t
ID     TYPE        LOCATION      STATUS    CURRENT  THRESHOLD
 1  Basic Sensor Mem. Brd. (1)  Normal    78F/ 26C 188F/ 87C
 2  Basic Sensor Mem. Brd. (1)  Normal    78F/ 26C 188F/ 87C
 3  Basic Sensor Mem. Brd. (1)  Normal    82F/ 28C 188F/ 87C
 4  Basic Sensor Mem. Brd. (1)  Absent   ---F/---C ---F/---C
 5  Basic Sensor Mem. Brd. (1)  Normal    80F/ 27C 188F/ 87C
 6  Basic Sensor Mem. Brd. (1)  Absent   ---F/---C ---F/---C
 7  Basic Sensor Mem. Brd. (1)  Normal   104F/ 40C 203F/ 95C
 8  Basic Sensor Mem. Brd. (2)  Absent   ---F/---C ---F/---C
 9  Basic Sensor Mem. Brd. (2)  Absent   ---F/---C ---F/---C
10  Basic Sensor Mem. Brd. (2)  Absent   ---F/---C ---F/---C
11  Basic Sensor Mem. Brd. (2)  Absent   ---F/---C ---F/---C
12  Basic Sensor Mem. Brd. (2)  Absent   ---F/---C ---F/---C
13  Basic Sensor Mem. Brd. (2)  Absent   ---F/---C ---F/---C
14  Basic Sensor Mem. Brd. (2)  Absent   ---F/---C ---F/---C
15  Basic Sensor System Board    Absent   ---F/---C ---F/---C
16  Basic Sensor System Board    Absent   ---F/---C ---F/---C
17  Basic Sensor Ambient         Normal    71F/ 22C 140F/ 60C
18  Basic Sensor Ambient         Normal    80F/ 27C 158F/ 70C
19  Basic Sensor System Board    Normal    68F/ 20C 233F/112C
20  Basic Sensor System Board    Normal    86F/ 30C 174F/ 79C
21  Basic Sensor System Board    Normal    73F/ 23C 140F/ 60C
22  Basic Sensor System Board    Normal    75F/ 24C 143F/ 62C
23  Basic Sensor System Board    Normal    75F/ 24C 143F/ 62C
24  Basic Sensor System Board    Normal    71F/ 22C 143F/ 62C
25  Basic Sensor System Board    Normal    71F/ 22C 172F/ 78C
26  Basic Sensor System Board    Normal    71F/ 22C 172F/ 78C
27  Basic Sensor System Board    Normal    71F/ 22C 172F/ 78C
28  Basic Sensor System Board    Normal    77F/ 25C 185F/ 85C
29  Basic Sensor System Board    Normal    75F/ 24C 185F/ 85C
30  Basic Sensor System Board    Normal    77F/ 25C 185F/ 85C
31  Basic Sensor System Board    Normal    84F/ 29C 159F/ 71C
32  Basic Sensor System Board    Normal    86F/ 30C 176F/ 80C
33  Basic Sensor System Board    Normal    68F/ 20C 140F/ 60C
34  Basic Sensor System Board    Normal    68F/ 20C 140F/ 60C
35  Basic Sensor System Board    Normal   107F/ 42C 230F/110C
ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • 1
    Well, what he CAN do is make sure that the sensor reading makes sense. I have 2 SuperMicro cases, identical, one above the other. In one I get sensor temperature warnings on the SAS backplane, the other is somehow 20 degree less in temperature. We checked airflow and room temperature multiple times - this ONE sensor is off. WAY off. – TomTom Jun 16 '14 at 07:44