1

I recently toke over a role and handled this server dl360 g7. It has 2 power supplies all are ok. 1 degraded fan and others are all green. Installed with VMware esxi and and 3 vm installed. The server will run for few hours (not consistent) sometime 3 to 4 hours sometime its last longer but less than 12 hours it became unresponsive. The health Led turns blinking red but the power remains green. I tried to press and hold the power button to shutdown but it wont, instead the health Led will turns amber for few seconds, fans will speed up a bit then turns to red again and nothing happen. The only way to turn it on is to unplug both power cord and plug it again. Once on, it will continue accessible but after few hours again the problem repeat.I was wondering if the degraded fan causes this instance? Or there is more something bigger problem. Appreciate any help. Thanks

ewwhite
  • 197,159
  • 92
  • 443
  • 809
Tim Al
  • 11
  • 1
  • 2

2 Answers2

1

A single broken fan should never cause this kind of "critical" (that's what health light = red means) problem. Your server has probably six hot swappable fans that create a wind tunnel through the chassis, so if five are really up then you should be fine IMO.

You need to get a handle on what component is failing/has failed/is getting too hot. Use HP Insight Diagnostics software to look at your hardware in detail whether it is running or turned off.


If using diagnostics does not highlight an obvious problem for you, your HP Proliant also has a non-maskable interrupt function. There's a pair of pins on the server's mobo which, when shorted, will initiate a crash dump for you to review for problem details.

The pin location should be shown on the big label under the top cover of the server, but be sure to carefully read HP's documentation first.

Tedwin
  • 559
  • 3
  • 14
  • hi many thanks for your response. i am able to login to the ILo's Systems Information, everything is normal, the Temp are way below threshold, power supplies, Processors, Ram, NIC, Drives except the fan that says (degraded but still spining at 18%).. I have noticed that when the server is in "unresponsive state" The SID (System Insight Display) shows both CPU's are lit in steady orange, and so the failing fan at block 4. do you think CPU's are causing these failure? I will have a look at those pins once im in the office first thing in the morning. really appreciate your input. – Tim Al Jun 19 '16 at 18:37
  • Don't do anything with the jumper pins. – ewwhite Jun 19 '16 at 18:51
0

Log onto the ILO3 interface and it will tell you exactly what the server's health history and specific problems are within the IML log. This is the best option for a VMware installation since you likely don't have the HP-specific ESXi build running on your system.

You should also replace the failed fan.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • hi thank you for you reply. i can actually login to the ILo interface. Systems Information, everything is normal, the Temp are way below threshold, power supplies, Processors, Ram, NIC, Drives except the fan.. the IML log shows 2 cautions, fan failure and fan solution not fully redundant. yes i will be ordering the new fan asap. – Tim Al Jun 19 '16 at 18:51
  • The server may have a failing fan in addition tithe one that's already bad. Trace the logs to see what correlates to the outage times. Losing two fans on that platform will halt the server. – ewwhite Jun 19 '16 at 18:52
  • thanks, i hope its only the fan. i will comeback here as soon as i have installed a new fan and see what happens. by the way, i have noticed in the ILo event log, "On-board clock set; was 01/01/1970" and "On-board clock set; was 06/10/2011 ". the're keep on showing alternately.. any idea? many thanks – Tim Al Jun 19 '16 at 19:10
  • System board battery could be an issue too. – ewwhite Jun 19 '16 at 19:45
  • Hi, i have replaced the faulty fans and all fans are working now, however the main problem is still there. it would still become unresponsive after sometime. SID LED display 2 procs lit in orange. Health LED is Red. Is there a chance the processors are the culprits? – Tim Al Jun 25 '16 at 10:23
  • What does the ILO IML log say? – ewwhite Jun 25 '16 at 10:24
  • IML shows old logs but dates are "not set", fan failure, system fans not redundant.. and its keep repeating.. – Tim Al Jun 25 '16 at 10:36
  • Clear the logs. Then try again. – ewwhite Jun 25 '16 at 10:37
  • okay, i will do that. i was informed that the server wasnt really tested after it was bought "used" by the previous admin 5 months ago. Do you think a Server this old could still function properly? or expected parts are failing and could fail anytime soon? – Tim Al Jun 25 '16 at 10:50
  • These machines can be viable and should work fine. I've given you the tools to see what the system health and specific errors are. Read them and act accordingly. – ewwhite Jun 25 '16 at 10:53
  • thanks really appreciate your advice. as soon as i have done that, will get back and post the result. – Tim Al Jun 25 '16 at 10:57