3

Dell OpenManage reported the following:

Memory device status is critical Memory device location: DIMM_B2 Possible memory module event cause:Multi bit error encountered

What does this mean? How bad is it?

AXE Labs
  • 1,549
  • 5
  • 19
  • 24

2 Answers2

1

The event message reference for this was 1404. It indicates a faulty DIMM that should be replaced but from what I read on blogs, the alert often clears and does not come back after reboots. Since it only tripped once for me, I cleared the memory errors using OMSA (dcicfg32.exe) and so far so good.

AXE Labs
  • 1,549
  • 5
  • 19
  • 24
  • This was a good move - replacement typically isn't warranted after a single occurrence, though I'd seriously consider it if the problem ever returns on that particular DIMM. – JimNim Sep 06 '13 at 14:52
  • Similarly, I was seeing "Single bit warning error rate exceeded" and "Single bit failure error rate exceeded" on a Linux host. These can be cleared as well but with omconfig: 'omconfig system alertlog action=clear' and 'omconfig system esmlog action=clear'. Lets hope they don't come back or its trash for the dimms. – AXE Labs Mar 06 '14 at 20:18
  • Make sure you've got the latest firmware/BIOS too -- I have seen cases where these sorts of errors were spurious and "fixed" by firmware. – Wil Cooley May 19 '14 at 08:09
1

Cause of error according to Dell: "A memory device correction rate exceeded an acceptable value, a memory spare bank was activated, or a multibit ECC error occurred. The system continues to function normally (except for a multibit error). Replace the memory module identified in the message during the system's next scheduled maintenance. The memory device status and location are provided."

Try replacing the DIMM with an identical one. If you have the memory under warranty then go for a replacement from the same vendor.

longneck
  • 23,082
  • 4
  • 52
  • 86
wit
  • 66
  • 1