0

About once every two weeks, I get this kind of error in my kernel log:

[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: attempting task abort! scmd(000000006f6a751f)
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: [sde] tag#3471 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: handle(0x001d), sas_address(0x443322110b000000), phy(11)
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: enclosure logical id(0x500062b206412140), slot(17) 
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: enclosure level(0x0000), connector name(     )
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: task abort: SUCCESS scmd(000000006f6a751f)
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: attempting task abort! scmd(000000005203b095)
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: [sde] tag#3012 CDB: Read(16) 88 00 00 00 00 02 a5 27 a8 48 00 00 01 00 00 00
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: handle(0x001d), sas_address(0x443322110b000000), phy(11)
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: enclosure logical id(0x500062b206412140), slot(17) 
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: enclosure level(0x0000), connector name(     )
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: task abort: SUCCESS scmd(000000005203b095)
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: [sde] tag#3012 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: [sde] tag#3012 CDB: Read(16) 88 00 00 00 00 02 a5 27 a8 48 00 00 01 00 00 00
[Wed Jul  6 16:11:14 2022] print_req_error: I/O error, dev sde, sector 11360774216
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: attempting task abort! scmd(00000000baf88a87)
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: [sde] tag#3011 CDB: Read(16) 88 00 00 00 00 02 a5 27 a3 48 00 00 01 00 00 00
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: handle(0x001d), sas_address(0x443322110b000000), phy(11)
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: enclosure logical id(0x500062b206412140), slot(17) 
[Wed Jul  6 16:11:14 2022] scsi target0:0:4: enclosure level(0x0000), connector name(     )
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: task abort: SUCCESS scmd(00000000baf88a87)
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: [sde] tag#3011 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: [sde] tag#3011 CDB: Read(16) 88 00 00 00 00 02 a5 27 a3 48 00 00 01 00 00 00
[Wed Jul  6 16:11:14 2022] print_req_error: I/O error, dev sde, sector 11360772936
[Wed Jul  6 16:11:14 2022] sd 0:0:4:0: Power-on or device reset occurred
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2451 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#3453 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#3200 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#3453 CDB: Read(16) 88 00 00 00 00 05 74 ff fd 20 00 00 00 08 00 00
[Wed Jul  6 16:11:15 2022] print_req_error: I/O error, dev sde, sector 23437770016
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2451 CDB: Read(16) 88 00 00 00 00 01 fd 8e 63 38 00 00 01 00 00 00
[Wed Jul  6 16:11:15 2022] print_req_error: I/O error, dev sde, sector 8548934456
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#3200 CDB: Read(16) 88 00 00 00 00 01 fd 8e 64 38 00 00 01 00 00 00
[Wed Jul  6 16:11:15 2022] print_req_error: I/O error, dev sde, sector 8548934712
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: Power-on or device reset occurred
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2050 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2504 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2050 CDB: Write(16) 8a 00 00 00 00 05 26 99 8f 68 00 00 00 08 00 00
[Wed Jul  6 16:11:15 2022] print_req_error: I/O error, dev sde, sector 22122434408
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2504 CDB: Read(16) 88 00 00 00 00 00 00 00 20 00 00 00 00 08 00 00
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#3203 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#3203 CDB: Read(16) 88 00 00 00 00 02 a5 27 ad 48 00 00 01 00 00 00
[Wed Jul  6 16:11:15 2022] print_req_error: I/O error, dev sde, sector 11360775496
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2505 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:15 2022] sd 0:0:4:0: [sde] tag#2505 CDB: Read(16) 88 00 00 00 00 02 a5 27 ac 48 00 00 01 00 00 00
[Wed Jul  6 16:11:15 2022] print_req_error: I/O error, dev sde, sector 11360775240
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:15 2022] print_req_error: I/O error, dev sde, sector 8192
[Wed Jul  6 16:11:15 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] sd 0:0:4:0: Power-on or device reset occurred
[Wed Jul  6 16:11:16 2022] sd 0:0:4:0: [sde] tag#2615 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 16:11:16 2022] print_req_error: I/O error, dev sde, sector 22122434448
[Wed Jul  6 16:11:16 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] sd 0:0:4:0: [sde] tag#2615 CDB: Write(16) 8a 00 00 00 00 05 26 99 8f a0 00 00 00 08 00 00
[Wed Jul  6 16:11:16 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 16:11:16 2022] sd 0:0:4:0: Power-on or device reset occurred
[Wed Jul  6 16:11:17 2022] sd 0:0:4:0: Power-on or device reset occurred
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: attempting task abort! scmd(00000000685dac60)
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: [sdi] tag#371 CDB: Read(16) 88 00 00 00 00 05 23 d4 00 e0 00 00 01 00 00 00
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: handle(0x0021), sas_address(0x4433221113000000), phy(19)
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: enclosure logical id(0x500062b206412140), slot(9) 
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: enclosure level(0x0000), connector name(     )
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: task abort: SUCCESS scmd(00000000685dac60)
[Wed Jul  6 17:31:04 2022] scsi_io_completion_action: 6 callbacks suppressed
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: [sdi] tag#371 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: [sdi] tag#371 CDB: Read(16) 88 00 00 00 00 05 23 d4 00 e0 00 00 01 00 00 00
[Wed Jul  6 17:31:04 2022] print_req_error: 6 callbacks suppressed
[Wed Jul  6 17:31:04 2022] print_req_error: I/O error, dev sdi, sector 22075932896
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: attempting task abort! scmd(00000000c7dc4ce2)
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: [sdi] tag#370 CDB: Read(16) 88 00 00 00 00 05 23 d3 ea e0 00 00 01 00 00 00
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: handle(0x0021), sas_address(0x4433221113000000), phy(19)
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: enclosure logical id(0x500062b206412140), slot(9) 
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: enclosure level(0x0000), connector name(     )
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: task abort: SUCCESS scmd(00000000c7dc4ce2)
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: [sdi] tag#370 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: [sdi] tag#370 CDB: Read(16) 88 00 00 00 00 05 23 d3 ea e0 00 00 01 00 00 00
[Wed Jul  6 17:31:04 2022] print_req_error: I/O error, dev sdi, sector 22075927264
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: attempting task abort! scmd(00000000d5697c0a)
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: [sdi] tag#16 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: handle(0x0021), sas_address(0x4433221113000000), phy(19)
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: enclosure logical id(0x500062b206412140), slot(9) 
[Wed Jul  6 17:31:04 2022] scsi target0:0:8: enclosure level(0x0000), connector name(     )
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: task abort: SUCCESS scmd(00000000d5697c0a)
[Wed Jul  6 17:31:04 2022] sd 0:0:8:0: Power-on or device reset occurred
[Wed Jul  6 17:31:05 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:05 2022] sd 0:0:8:0: [sdi] tag#4 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:05 2022] sd 0:0:8:0: [sdi] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 00 00 08 00 00 00 08 00 00
[Wed Jul  6 17:31:05 2022] print_req_error: I/O error, dev sdi, sector 8
[Wed Jul  6 17:31:05 2022] sd 0:0:8:0: [sdi] tag#736 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:05 2022] sd 0:0:8:0: [sdi] tag#736 CDB: Read(16) 88 00 00 00 00 04 c8 4d fc 38 00 00 00 08 00 00
[Wed Jul  6 17:31:05 2022] print_req_error: I/O error, dev sdi, sector 20540423224
[Wed Jul  6 17:31:05 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:05 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:05 2022] sd 0:0:8:0: [sdi] tag#735 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:05 2022] sd 0:0:8:0: [sdi] tag#735 CDB: Read(16) 88 00 00 00 00 04 70 9a 87 30 00 00 01 00 00 00
[Wed Jul  6 17:31:05 2022] print_req_error: I/O error, dev sdi, sector 19069044528
[Wed Jul  6 17:31:05 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:05 2022] sd 0:0:8:0: Power-on or device reset occurred
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5726 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5723 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5726 CDB: Read(16) 88 00 00 00 00 01 53 df 28 00 00 00 01 00 00 00
[Wed Jul  6 17:31:06 2022] print_req_error: I/O error, dev sdi, sector 5702100992
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#939 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5723 CDB: Read(16) 88 00 00 00 00 05 74 ff fc 20 00 00 00 08 00 00
[Wed Jul  6 17:31:06 2022] print_req_error: I/O error, dev sdi, sector 23437769760
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#939 CDB: Read(16) 88 00 00 00 00 05 23 d3 fc e0 00 00 01 00 00 00
[Wed Jul  6 17:31:06 2022] print_req_error: I/O error, dev sdi, sector 22075931872
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: Power-on or device reset occurred
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5738 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5693 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[Wed Jul  6 17:31:06 2022] print_req_error: I/O error, dev sdi, sector 22238540184
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5693 CDB: Write(16) 8a 00 00 00 00 00 b9 9c 77 18 00 00 01 00 00 00
[Wed Jul  6 17:31:06 2022] print_req_error: I/O error, dev sdi, sector 3114039064
[Wed Jul  6 17:31:06 2022] sd 0:0:8:0: [sdi] tag#5738 CDB: Read(16) 88 00 00 00 00 05 74 ff ff 88 00 00 00 38 00 00
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:06 2022] mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Wed Jul  6 17:31:07 2022] sd 0:0:8:0: Power-on or device reset occurred
[Wed Jul  6 17:31:07 2022] sd 0:0:8:0: Power-on or device reset occurred

I have about 20 SATA drives attached to the SATA/SAS controller on this server, and the error occurs with many (though not all) drives, with some drives being causing errors more often than others. The issue seems to be related to filesystem load (heavier load => errors are more likely). Until today, the issue only ever affected one drive at a time, and all my drives are mirrored, so I've been able to resilver the faulted mirror whenever a fault occurred. I've been Googling this problem and searching various support forums without any success from time to time over the 2 year period while this problem has been plaguing me. However, today, both mirrors in a 2-drive mirror experienced the same fault in the space of 1 hour, making the need to solve this problem more urgent. I guess it could be a hardware/controller problem, but I don't know how to check if that's the case or not, or how to fix it if it is. Any help would be appreciated. Thank you.

  • This is a common and pretty complex issue. For some reason some of your devices reset and this causes r/w timeout. If your disks are not SMR, appear healthy when scanned outside the system in question, then the culprit may be: power supply, SAS enclosure itself, the enclosure SAS cables or power distribution cables to the enclosure (molex and such), as well as drive firmware incompatibility with the controller. The above is sorted in reverse order of the expected time of appearance from the deployment date, i.e firmware and cabling cause issues earlier than enclosure/PSU. – Peter Zhabin Jul 07 '22 at 18:29
  • Thank you! I'm now getting this several times a day. It seems to occur most often on a busy zpool (where I have VMs) while scrubbing another zpool - especially a zpool which contains vdevs with ashift=9 made up of 4kn drives (for legacy reasons & now I can't get rid of ashift=9). Drives are CMR, good, and it happens with both Western Digital and Toshiba drives - various models. Various drive bays across various rows and various columns. Is there any way to try to pinpoint the culprit (very preferably without shutting down the server)? Thank you! – user52932 Feb 17 '23 at 01:33
  • I would consider changing the power supply in the first place if the situation seems to worsen over time – Peter Zhabin Feb 17 '23 at 07:57
  • Thank you. The server has a redundant PSU (2 PSU modules). I've replaced both the modules. The errors continue as before. Each module can supply 800W (190W on the +3.3V and +5V rails), and each module is supplying 125W, so the power supply seems to have plenty of headroom. If I remove one module, the other module drives the full 250W with no trouble. I also checked all the cables. All SATA connectors and all Molex power connectors are properly inserted. 6 SAS/SATA ports are utilized on the controller, each supplying 4 disks. The problem occurs across all 6 controller ports. What next? Thanks. – user52932 Mar 17 '23 at 20:00

0 Answers0