0

Server is a VMware ESXi 5.5 host. Uptime on it is 65 days. LCD on the server is showing an error: Persistent correctable memory error rate has increased for a memory device at location DIMM B4 I'd like to look in DRAC to see what's going on, but cannot access it. The last time I needed DRAC (a few months ago, can't really recall) I was able to access it via the web portal. However, now it is not responding to PING, not even when I just try to ping the IP (we also have a DNS A record for it). I am in the same office as the server, on the same LAN.

Is there anything I can do while the server is up, or do I just have to try and reboot it during the next maintenance window? This is a ESXi host running several VM's. I cannot try anything risky during normal business hours. Thanks.

SamAndrew81
  • 284
  • 1
  • 6
  • 20

2 Answers2

1

On the LCD interface, you should be able to navigate to view the iDRAC IP address. Verify it is still valid. If so, the firmware could be frozen.

As for your memory error rate, this may be a bad RAM chip in slot B4. If not corrected, your RAM amount may go down or the server may crash (purple screen of death). Do you currently have a support contract with Dell? If so, log a case with them and they should overnight a new RAM module. If not, you'll need to consider replacing it soon.

If your host is in a cluster, I would advise trying to vMotion your VMs and put this server in maintenance mode until it could be addressed.

  • Thank you, Jarrod. I have double-checked the IP on the LCD, yes. We do have vMotion, but I only have one other host and it does not have the resources to accommodate all the hosts (SMB budget, what can I say). I've also opened a ticket with our warranty provider. – SamAndrew81 Dec 08 '16 at 23:39
  • 1
    Are you able to check the status of the iDRAC or LifeCycle Controller on the LCD? – Jarrod L. J. Gibson Dec 08 '16 at 23:41
  • I don't think so. If that is possible, I'm not sure how to do it. I'll google it... – SamAndrew81 Dec 09 '16 at 00:26
  • I have rebooted the server now: no change – SamAndrew81 Dec 09 '16 at 02:47
  • It's likely going to be bad RAM. If you can still do a reboot, try getting into the LifeCycle Controller and testing the hardware. – Jarrod L. J. Gibson Dec 09 '16 at 05:59
  • Sorry, to clarify the RAM preemptive failure warning is now gone after re-seating the RAM module. However, I still cannot access the iDRAC web console. Also, I'm not sure if this server has a LifeCycle controller or not. The only config options I see on boot are for the BIOS, RAID controller and iDRAC. – SamAndrew81 Dec 09 '16 at 16:23
  • Next time you reboot, go into your iDRAC and check the configuration. If it's a shared LOM or dedicated port. Make sure HTTP/S is enabled. – Jarrod L. J. Gibson Dec 09 '16 at 16:33
  • I allocated NIC0 and NIC1 for VMware to use on vSwitch0. I'm thinking that is the issue here... – SamAndrew81 Dec 09 '16 at 20:14
  • 1
    It shouldn't be. You can share a LOM with the host OS. You can have multiple vmnics assigned to a single switch, even if iDRAC is using it as a shared LOM. However, if it's not shared in iDRAC, you won't have remote access through the NICs. – Jarrod L. J. Gibson Dec 09 '16 at 20:33
1

Got it. The issue was NIC0 was being used for vMotion so was on the iSCSI switch, not the LAN switch. When I changed the patch cable on NIC0 over to the LAN switch I started getting a PING response on the IP and I was able to open the iDRAC web console.

SamAndrew81
  • 284
  • 1
  • 6
  • 20