-2

I'm putting an older Supermicro X9DRG-QF motherboard back into service as a VM host. The machine did a great and error-free tour of duty in a digital installation a while back, so I can confirm it was working fine in a previous life. I have upgraded the RAM to 128GB with 8x16GB DDR3-1866MHz PC3-14900 ECC ram, and I had installed FreeNAS 11.2 for testing purposes. Worked like a champ, no issues.

I've recently been kicking the tires on VMware ESXi, and for some reason with that change the motherboard is now reporting that one (and only one) DIMM slot is going into UNR territory (127 degrees C from watching the real-time reports under IPMI dashboard), but from direct observation (touch) I can confirm that the DIMM temp is not actually spiking out of range (not even remotely).

I have tried swapping the DIMMs between slots so I can confirm it's not a specific DIMM. Does this reek of a failing motherboard? Any advice on how to potentially isolate the issue would be greatly appreciated.

Darren

  • What exactly do you mean by "digital installation"? Like a kiosk? Might be relevant if it was in a stressful environment in some way. For example, could there be dust or corrosion on your DIMM socket contacts? – Spooler Apr 02 '19 at 01:28
  • Yes, this was used to power a video wall, so it was running 24/7 for several years. I'll see about properly cleaning out the DIMM socket contacts, good call. – darrendavid Apr 02 '19 at 17:52

1 Answers1

1

You've potentially sufficiently isolated your issue with the necessary degree of precision. You've even used the IPMI card to rule out OS-specific issues. It's at this point reasonable to assert that your hardware or firmware is faulty.

You might try resetting and maybe upgrading the firmware. But more than likely the hardware has become faulty (this issue "appeared" recently, and persists after DIMM replacement).

Spooler
  • 7,046
  • 18
  • 29
  • Copy that. SuperMicro is also recommending resetting the BMC to factory defaults, which I'll try as well. Certainly never seen a mobo fail like this, but there's a first time for everything. I'll report back when I've tried all of the above. Thanks for the perspective. – darrendavid Apr 02 '19 at 17:53