2

I have a Gigabyte GA-965P-DQ6 motherboard with the latest BIOS flashed in (version F13).

I also have 4x sata disks (in proven working condition) from a different server. Two of them are Western Digital ATA WDC WD1003FBYZ-0 and the other 2 the (very similar) ATA WDC WD1003FBYX-0.

To connect the disks I used 2x ASMedia ASM1061 PCIe-1x to SATA sata controllers (firm.ver. 3.80), since I have already used up all of my motherboard's SATA and PATA ports on other hard disks.

The problem is that once I connect the disks onto the sata controllers, the boot process stops sometime before lilo (I am running slackware linux) boot loader menu is loaded. In fact the boot process gets stuck in the following screen:

boot process stops here

If I remove the disks, but keep the controllers connected onto the motherboard, the boot will continue normally. Then once booted I can "hot plug" the disks and they will show up fine in the system (using dmesg to verify). I have used cfdisk to overwrite the "MBR" of each disk and to make sure that the bootable flag is not set.

I have also double checked whether the BIOS uses as the first boot device the disk I actually want (which is not part of these 4 disks) and indeed I can verify that this is the case. Here you can see that the BIOS has identified the following boot devices (disks and pci cards):

BIOS has identified the following devices

You can see here most of the disks that are directly attached to the motherboard's SATA ports - so none of the 4 disks that create the problem. You can also see that the last option #8 "Bootable add in cards" refers to the sata controller cards. So on top there is the preferred boot device and down at the bottom there is the sata controller cards, but somehow during boot the process will stop.

So I am out of ideas about what could be causing the boot process to halt. I have not been able to find info about how to log on the sata controller's boot configuration menu (if any exists): no informative message about a key combination that would allow that appears, nor I have found any relevant info on the internet. Is it possible that this sata controller model does not have a boot configuration screen?

EDIT: Thinking out loud, I wonder if the normal procedure when you try to boot a non bootable disk would throw a "boot failure" message. Since this does not happen here maybe this is not at all a problem with the extra disks trying to be booted from, but something different altogether (no clue as to what though).

EDIT: relevant contents of fstab per comment request (I wonder about the usefulness though since this is a BIOS problem):

/dev/md1         swap             swap        defaults         0   0
/dev/md0         /                reiserfs    defaults         1   1
nass
  • 568
  • 4
  • 10
  • 24
  • Did you check the HDD order in the BIOS. Sometimes you have the ability to change the order / boot order of the disks. – Thomas Oct 15 '16 at 13:53
  • @Thomas Hi there I have added some info and a picture related to your ideas. I wish it was this , but I am afraid the problem is located somewhere deeper. – nass Oct 15 '16 at 14:31
  • If this Q does not sound enough of a "serverfault" question, perhaps it is more suitable in "superuser". so instead of closing it perhaps it can be moved over there? – nass Oct 15 '16 at 15:53
  • Please, show content of `/etc/fstab` file, grub version and screenshot of boot error. – Mikhail Khirgiy Oct 16 '16 at 15:00
  • @MikhailKhirgiy Hi there, no grub - I am using LILO, you can already see a screen shot of where the boot halts (on top, dark background) ., I have updated the Q to show the `fstab` – nass Oct 17 '16 at 08:07
  • Can you use grub2 instead Lilo? What version of Slackware Linux do you have? – Mikhail Khirgiy Oct 17 '16 at 15:42
  • @MikhailKhirgiy I am afraid grub is out of the question on this system, I could try with a different boot disk though. – nass Oct 17 '16 at 15:44
  • I think the problem is in boot configuration. You have configuration to boot from sda (md0 isn't bootable device). You need change configuration to boot from uuid named device. – Mikhail Khirgiy Oct 17 '16 at 15:50
  • @MikhailKhirgiy hmmm I could try `uuid`'s with `LILO`. i'll let you know how it goes. – nass Oct 17 '16 at 16:01
  • I didn't do it before. But i read that it really possible to do with grub2. Try use uuid of sda partition witch is used in md0 software RAID. – Mikhail Khirgiy Oct 17 '16 at 16:07
  • @MikhailKhirgiy unfortunately, as I suspected, it was not a problem with the disk uuids. It is somewhere before bootloader kicks in. – nass Oct 17 '16 at 19:22
  • 1
    Most likely this is an issue with the cheap SATA controllers you are using. Their firmware could be buggy which causes the issue. – Tero Kilkanen Oct 17 '16 at 23:46
  • Then try to change IRQ 15 on inserted SATA controller or onboard controller to different less value. – Mikhail Khirgiy Oct 18 '16 at 05:07
  • @TeroKilkanen indeed. that is the case. I am sending them back.one out of 3 actually works.. with an older firmware , whereas the 2 others do not work. Its either PCI-e 2.0 to PCI-e 1.0 incompatibility (as Gigabyte motherboard support state) or really buggy controller firmware. Either way I am sending them back. – nass Oct 18 '16 at 10:25

1 Answers1

0

It may sound extreme, but I would rule out a power supply issue, by having all disks powered up, but the new four with their data cable disconnected. If the server boots ok, connect data cable for one disk, one by one, rebooting each time.

I have had issues like this one (while adding a 5th disk), and turned out the power supply was able to spin up the disks, but they failed when being enumerated by the BIOS, causing all kind of erratic behaviour, including complete freeze.

Also, if you haven't done so and the downtime is acceptable, I would leave the frozen server alone for up to 30 minutes: there are cases you get meaningful error messages after a very long timeout.

Pablo
  • 440
  • 2
  • 9