-2

we have Dell m1000e blase system with 16x m610 servers, we have a frequent error with radom servers.

The system board Current Latch current is outside of the allowable range.

The system board fail-safe voltage is outside of the allowable range.

once we received that error the server shuts down permanently and stops respond to any action.

i googled that error many times but i did't find any clear result.

dos'e any one have a clue regarding this error or how to resolve it?

what could possibly cause such error?

is that error is a symptom of other failure in the system?

please help.

thank you for reading

1 Answers1

2

This is a hardware fault. Either the PSU is faulty, the power connection is faulty, or the motherboard in question is faulty.

Since it happens to random blades, it's not likely to be an issue with any particular system board, which leaves the PSU or the power connection. First power connection I'd check is the power connection to the enclosure. Could be getting dirty power.

You'd check your system logs for details and to narrow it down, but if it's not dirty power to the enclosure, all you can really do is check the connection between the blades and the enclosure (make sure it's clean, not corroded, and that it's firmly seated), and failing that, replace the defective component.

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
  • For such big enclosure, Iam pretty sure you cant loose more than 50% of the PSU. (like a VRTX from dell, fulled it need 3 PSU on four (3+1)) – yagmoth555 May 25 '16 at 19:30
  • HopelessN00b, when move the faild server to another known good slot i got the same result, rega rd ing to the power input, we have a APC srt 10k UPS the voltage output monitor is clean. – Shady Abdelaziem May 25 '16 at 19:37
  • @ShadyAbdelaziem And what do the system logs say? Both the enclosure's logs and the individual blades. I don't know why you're troubleshooting without looking at the logs. – HopelessN00b May 25 '16 at 19:40
  • @HopelessN00b, i did looked at it for sure, The system board Current Latch current is outside of the allowable range. The system board fail-safe voltage is outside of the allowable range. these are the only 2 errors i got, then the server turns to red in the idrac we interface – Shady Abdelaziem May 25 '16 at 19:44
  • @HopelessN00b, the inclousre error is Server 14 health changed to a non-recoverable state – Shady Abdelaziem May 25 '16 at 19:45
  • 4
    It's a critical hardware problem, put the internet away and call Dell – Sum1sAdmin May 26 '16 at 07:20