2

Any idea of what this error E211 SBE LOG DISABLE DIMM6 could mean? Anyone have experienced this?

I know that it sound pretty obvious, that is the 6th bank of my memory, and guest what:

  • I have already changed the order of the RAM modules. Same error on the same DIMM slot
  • Dell already replaced the MOBO with a new one. RAM modules connected on different order, same error. Already cleaned up the logs with the OMSA livecd.
  • Boot to memtest+ shows nothing
  • All possible firmwares on this Motherboard are up-to-date

It could be another component or a firmware issue?

Dell is taking a look too, but found nothing at the system logs created by the OMSA.

A clear symptom is that, the error starts after one hour of operation. The operating system on this hardware is ESXi 5.0.1. No system crash has resulted because of this error.

Edit: I have cleaned the bios logs through /opt/dell/dset/clearesm.sh (OMSA liveCD) on the new mobo, rebooted into memtest+(still on livecd) and after 20 minutes the display turned on the error message and memtest found no errors...

Edit 2: ./dcicfg32 command=clearmemfailures neither the BCM clear on the Bios(Control + E during the post -> System event log menu -> Clear system event log) seems to resolve the question. After 20 minutes of operation, the error comes back.

Edit 3: MOBO was changed(see above) and both had/have the same error. Swap memory positions or use memory of other 2950 server changes nothing on the error.

2 Answers2

3

This indicates that a single-bit error (SBE) has occurred on DIMM 6 with such a frequency that the system is no longer logging the error until it is rebooted. (See https://support.quest.com/SolutionDetail.aspx?id=SOL60022 for background.)

It's a bit perplexing that you're seeing the same error after replacing the motherboard but it is possible that the replacement board has the same defect as the first board. Since you moved the DIMMs around and the problem hasn't followed the DIMM I'm less likely to suspect the DIMM.

I would use the appropriate Dell MpMemory diagnostic for that server rather than memtest+. The Dell tool is going to be aware of any Dell-specific hardware features.

Evan Anderson
  • 141,881
  • 20
  • 196
  • 331
  • Yeah, that is weird. There is a guy on this thread(http://www.fixya.com/support/t2480699-e2111_sbe_log_disable_dimm) that have the same problem, and the DIMM error jump back to DIMM5 after 10 minutes. I'll boot the OMSA cd again, and try to clean up the mobo logs... –  Jul 09 '13 at 10:43
  • You were right. Memtest couldn´t find errors while the Dell tool found it –  Dec 10 '13 at 09:56
  • Glad to hear you were able to isolate the fault. – Evan Anderson Dec 10 '13 at 14:11
  • Belive it or not but it was both. The Memory slot on the first mobo(Dell checked) AND the RAM module. :) –  Dec 10 '13 at 15:33
0

One question, when you said "try to clean up the mobo logs", are you referring to the logs of the BMC (Hardware Logs), or to reset the memory error counter, if you are talking about the BMC what you need to do is just clear SBE counter, at least, to be sure is not a false warning.

To clear the SBE logs counter you can run the following commnad "./dcicfg32 command=clearmemfailures" from the LIveCD.

Coré
  • 394
  • 1
  • 3
  • Hum. I was instructed to just clean up the hardware logs through `opt/dell/dset/clearesm.sh`. I'll try to clean up the SBE logs as instructed when the machine finish the memtest(ETA - 2 hours). –  Jul 09 '13 at 17:51
  • Just to give a feedback: didn't worked :( –  Jul 10 '13 at 20:42
  • Well, in that case, could be the motherboard, be sure you have the latest firmware version, BIOS, BMC and DRAC(if you have). – Coré Jul 12 '13 at 05:40
  • It is on the description. We have already changed the mobo, and updated the BIOS and BMC to the latest version ;) - Changing the RAM modules position either do not make any difference on the error –  Jul 12 '13 at 10:46
  • Yes, I saw it, but with the information you are giving, looks like the new Mobo is failing in the same slot, right now I cannot think in other problem, but for my experience a motherboard replaced with the same error is very extrange. – Coré Jul 12 '13 at 14:57
  • yeah, i know, and the support guys have the same opinion about it. i think that could be other component because, even with other mobo AND other ram modules the error persists. weeeeird. –  Jul 12 '13 at 18:50