1

I have a brand new install of Rocks Cluster OS 7.0 (based on CentOS) and I'm trying to test-install a few nodes. Everything in the system seems fine, but when I try to install a new node with insert-ethers I get this error on the node (it only shows for a few milliseconds before reboot): Sorry about the error being an image, but that is a physical diferent screen, and I can't copy-paste the error

The insert-ethers screens tells me that the node has not kickstarted yet, but the node keeps giving that error and restarting. On the other side, the PXE initial screen says that vmlinuz and initrd are downloaded succesfully.

Any hint about what can be happening is apreciated.

Nodes specs are here:

  • Motherboard: AsRock AB350 Pro 4
  • EFI Ver: P4.70
  • Processor: AMD Ryzen 3 2200G
  • RAM: DDR4 2400Mhz 16GB

EDIT I've tested the installation with diferent, older, hardware, and it just works fine, so it must be some kind of incompatibility between the kernel and the node's hardware.
uname -r output: 3.10.0-693.5.2.el7.x86_64

Shirkam
  • 63
  • 8
  • Have you considered the possibility that the initramfs really is corrupt? – Michael Hampton Jan 21 '19 at 17:45
  • Well, I tried to decompress it and it was all fine. Maybe I checked the wrong file. Which one should I check? – Shirkam Jan 21 '19 at 17:50
  • Check the one that is sent to your node ;) Maybe you already got that right and the file is just fine. Next step is to check memory on your AsRock node. See my answer. – Freddy Jan 21 '19 at 17:59

1 Answers1

1

The kernel fails to unpack xz-compressed initrd from memory.

I see three possible reasons for the failure:

  1. initrd is corrupted (not very likely)

  2. Memory on the node is bad (either a bad memory module or wrong settings in bios, i.e. wrong timing parameters running the module out of its specs)

  3. Kernel has problems to xz-uncompress initrd. The feature is statically compiled into the kernel, but is somehow buggy/doesn't work as expected.

I would try the following:

  1. Try to xz-uncompress initrd to see if it is corrupted (no need to restore the full archive)

    # copy initrd to /tmp and add suffix ".xz" if missing, adjust path accordingly
    cp /boot/initrd.img /tmp/initrd.img.xz
    # unpack 
    xz -d /tmp/initrd.img.xz
    
  2. Check bios memory settings on the node, turn on extended memory check if option is available. Run full memtest. I'm not familiar with Rocks, but here is a manual how to run Memtest86 from it.

  3. Try to boot node from uncompressed initrd from step 1. and/or use different compression algorithm (gzip, bzip2, ...). Note that other compression modes must be supported by the kernel.

Freddy
  • 2,039
  • 7
  • 13
  • I cannot do step #2 as the node is not yet installed. I'll try to launch it with a linux USB, or something similar. – Shirkam Jan 21 '19 at 18:05
  • I've tested the installation with different hardware and it just works, so it must be some type of incompatibility. – Shirkam Jan 22 '19 at 12:31
  • So the memtest was successful and there were no "wrong" memory settings in the bios? – Freddy Jan 22 '19 at 12:37
  • Yes. And the nodes can be installed with another OS like Ubuntu 18. I did that as a test. – Shirkam Jan 22 '19 at 12:48
  • You mean from PXE? Did you try to send an unpacked initrd? And is your memory in the [Qualified Vendor List](https://www.asrock.com/MB/AMD/AB350%20Pro4/index.asp#Memory). It could just be picky memory. If possible/available I would try modules from a different vendor. – Freddy Jan 22 '19 at 12:53
  • Let's talk abot it [in this chat room](https://chat.stackexchange.com/rooms/88622/rocks-pxe-problem) – Shirkam Jan 22 '19 at 13:23