2

I have two OpenSuSE 11.4 hosts connected to an LSI CTS2600 storage array via SAS. Every time I reboot the hosts, I see in dmesg output like

[ 255.942890] end_request: I/O error, dev sdg, sector 8
[ 256.445301] sd 5:0:1:1: [sdg] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 256.445308] sd 5:0:1:1: [sdg] Sense Key : Illegal Request [current]
[ 256.445315] sd 5:0:1:1: [sdg] <> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
[ 256.445326] sd 5:0:1:1: [sdg] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00

It just so happens that the devices with the reported IO error, are always the devices in the passive path group.

First, I'm wondering: Why does this happen? I assume it has something to do with the system seeing the attached SAS hardware and querying it before the proper device drivers and/or software is loaded, but I'm not positive.

Second, what can I do to stop this from happening? In addition to increasing the boot time, since it will sit there and re-query the device again and again and again, it looks bad in the logs. And kicks off Nagios alerts. And generally is just annoying.

Since I feel like it's related in some fashion to drivers or modules, here's some boot information:

INITRD_MODULES: dm-multipath, mptbase, mpt2sas, mptscsi, mptspi, mptsas, 3w-sas, thermal, ata_generic, processor, fan

MODULES_LOADED_ON_BOOT: drbd, dm-multipath

It looks to me like I've got my bases covered with the INITRD_MODULES, but I'm not sure.

Kendall
  • 1,063
  • 12
  • 25

1 Answers1

1

Your array looks to be the OEM's version of a Dell MD3220, right? I have an MD3200i, it's the LFF and iSCSI version.

I had similar errors on the secondary path group, caused by multipath trying to use/check (I'm not sure) all existing paths to the LUN.

I'm not sure that the RDAC SCSI device handler module will help in your case; my Debian host has the following:

23:13:29 root@u14-0bA-site3:~> grep -v '^#' /etc/initramfs-tools/modules 
scsi_dh_rdac

Out of the box, it's the only change I needed to get up and running. With lousy performance, which is where a SAS-attached version like yours would have come in handy.

Luis Bruno
  • 480
  • 3
  • 9
  • 1
    Those messages are blow back from the SCSI mid-layer as it tries to interrogate the device not knowing it's part of a multipath configuration. To avoid this you need to install the proper dh module in the initrd so when SML interrogates it, it knows better and will back off sooner or all together during the probe process. This requisite module should be well documented in your SAN docs and easily answered by Dell customer support. It might be scsi_dh_rdac, it might be something else. Bottom line, your installation wasn't configured correctly to begin with. – ppetraki Feb 07 '12 at 16:19
  • Perhaps that was meant as an answer to Kendall? I say this because I had already loaded the device_handler module in the initrd. The performance, however, was not stellar -- that's what I needed to tune. But to get multipath working, out-of-the-box you'll need the device_handler module and nothing else. – Luis Bruno Feb 07 '12 at 20:21
  • 1
    It was, your answer is the correct one. I was merely elaborating as to the why the SCSI protocol is responding in that manner. I didn't feel that it was necessary to edit your original answer. – ppetraki Feb 07 '12 at 22:33
  • Thanks for the answer, I can't wait to check this out. Having this finally cleared up will be very nice. – Kendall Feb 08 '12 at 02:15
  • Thank you, @ppetraki. And Kendall, your other post on "not using all paths" was also very useful for my own debugging. Thank you both. – Luis Bruno Feb 08 '12 at 08:22