1

I have 3 IBM X3650 load balanced (round robin) with Red Hat Enterprise 4.8. One of them has intermittent kernel panics and reports an error on 2 particular memory banks (3 and 5) which are empty. I only have memory module on bank 1,4,7 and 10. I have tried to replaced all my memory modules to no avail.

user57081
  • 11
  • 1

4 Answers4

1

If they're identical machines and you can afford the downtime consider shutting down the 'bad' server and a working server, swapping their disks, bringing them up again and seeing if the problem moves with the disks or stays with the hardware. If the former you have a disk/OS issue, if the latter you have a hardware issue.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
0

Kernel panics usually indicate a hardware problem.

The banks 3 and 5 are empty: did you try to clean (with compressed air, (from a high position in order not to damage the card)) the memory slots area?

If the 3 machines have the same kernel version, configuration and software, it sounds like it is likely to be a motherboard problem: is the firmware version identical on the 3 motherboards?

Déjà vu
  • 5,546
  • 9
  • 36
  • 55
0

We just had a similar issue this week with an x3650 (M/T : 7979)

We were running BIOS v. 1.03 (which shipped with the system). Support recommended upgrading the BIOS - we had two systems in a similar load balanced setup, but the configurations drifted. One server had BIOS v. 1.15 and the other one was running a very old version of the BIOS. The older version BIOS system was the one with the issues.

The BIOS changelog cites several issues which were fixed with regards to memory. I recommend upgrading the BIOS using updateXpress or the Bootable Media Creator. If that doesn't work, dial 1800IBMSERV.

You can check your BIOS version by installing the IBM DSA (available at fix central) utility and running (as root)

./opt/IBM/DSA/bin/biosversion
andyhky
  • 2,732
  • 2
  • 25
  • 26
-1

Yes, errors from non-existent DIMMs do look like a motherboard fault. But it may be easier to try BIOS update first. AFAIK x3650 BIOS allows to reduce memory speed, which is also worth trying out.

Marx
  • 1