0

I have this output in dmesg:

[149939.146576] ata18.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[149939.146592] ata18.00: irq_stat 0x40000001
[149939.146600] ata18.00: cmd a0/00:00:00:04:00/00:00:00:00:00/a0 tag 2 pio 16388 in
                         Log Sense 4d 00 40 ff 00 00 00 00 04 00res 40/00:00:01:00:00/00:00:00:00:00/00 Emask 0x1 (device error)
[149939.146615] ata18.00: status: { DRDY }
[156569.787604] ata18.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[156569.787613] ata18.00: irq_stat 0x40000001
[156569.787617] ata18.00: cmd a0/00:00:00:04:00/00:00:00:00:00/a0 tag 4 pio 16388 in
                         Log Sense 4d 00 40 ff 00 00 00 00 04 00res 40/00:00:00:00:00/00:00:00:00:00/08 Emask 0x1 (device error)
[156569.787624] ata18.00: status: { DRDY }

I'd like to know what block device corresponds with this output

I've tried looking in /sys/block/*/device/scsi_device, /sys/block/*/device/scsi_disc, /sys/block/*/device/scsi_generic but those seem to be dead-ends

if I do ls -dal /sys/class/ata_port/ata18/device/host* I get this:

drwxr-xr-x 5 root root 0 Jun 30 03:00 /sys/class/ata_port/ata18/device/host17

Which seems to suggest that this ata port is for host 17

But if I do lsblk -S I get this:

NAME HCTL       TYPE VENDOR   MODEL             REV SERIAL         TRAN
sda  4:0:0:0    disk ATA      ST3250318AS      CC38 ********       sata
sdb  5:0:0:0    disk ATA      Hitachi HDS5C303 A580 ********       sata
sdc  6:0:0:0    disk ATA      ST8000VN004-2M21 SC60 ********       sata
sdd  7:0:0:0    disk ATA      ST8000VN004-2M21 SC60 ********       sata
sde  8:0:0:0    disk ATA      ST8000VN004-2M21 SC60 ********       sata
sdf  9:0:0:0    disk ATA      ST8000VN004-2M21 SC60 ********       sata
sdg  10:0:0:0   disk ATA      ST8000VN004-2M21 SC60 ********       sata
sdh  11:0:0:0   disk ATA      ST8000VN004-2M21 SC60 ********       sata

none of those are host 17.... Am I missing something here? can someone please explain? Thanks

EDIT: this is the output of ls -l /sys/block/sd* as requested

$ ls -l /sys/block/sd*
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sda -> ../devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sda
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sdb -> ../devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0/block/sdb
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sdc -> ../devices/pci0000:00/0000:00:1f.2/ata7/host6/target6:0:0/6:0:0:0/block/sdc
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sdd -> ../devices/pci0000:00/0000:00:1f.2/ata8/host7/target7:0:0/7:0:0:0/block/sdd
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sde -> ../devices/pci0000:00/0000:00:1f.2/ata9/host8/target8:0:0/8:0:0:0/block/sde
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sdf -> ../devices/pci0000:00/0000:00:1f.2/ata10/host9/target9:0:0/9:0:0:0/block/sdf
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sdg -> ../devices/pci0000:00/0000:00:1c.2/0000:04:00.0/ata11/host10/target10:0:0/10:0:0:0/block/sdg
lrwxrwxrwx 1 root root 0 Jun 30 03:00 /sys/block/sdh -> ../devices/pci0000:00/0000:00:1c.2/0000:04:00.0/ata12/host11/target11:0:0/11:0:0:0/block/sdh

EDIT: One thing that I am considering is offlining the Array, and pulling the drives one by one and checking dmesg, but waiting for a lengthy tape backup process to finish first, which might take days...

EDIT: as requested, I have included the dmesg filtered lines:

[   66.398867] ata18.00: failed to IDENTIFY (device reports invalid type, err_mask=0x0)
[   66.804074] ata18.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66
[   66.816444] ata18.00: configured for UDMA/66
[149939.146576] ata18.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[149939.146592] ata18.00: irq_stat 0x40000001
[149939.146600] ata18.00: cmd a0/00:00:00:04:00/00:00:00:00:00/a0 tag 2 pio 16388 in
[149939.146615] ata18.00: status: { DRDY }
[156569.787604] ata18.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[156569.787613] ata18.00: irq_stat 0x40000001
[156569.787617] ata18.00: cmd a0/00:00:00:04:00/00:00:00:00:00/a0 tag 4 pio 16388 in
[156569.787624] ata18.00: status: { DRDY }
[218674.843962] ata18.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[218674.843974] ata18.00: irq_stat 0x40000001
[218674.843979] ata18.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 7 dma 16640 in
[218674.843989] ata18.00: status: { DRDY }
[218674.891963] ata18.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[218674.891979] ata18.00: irq_stat 0x40000001
[218674.891988] ata18.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 8 dma 16640 in
[218674.892005] ata18.00: status: { DRDY }

"ATAPI: MARVELL VIRTUALL" seems suspicious, not sure what that is

Dave Butler
  • 131
  • 4
  • there's probably a "better" way, but `find /sys/devices | grep 'ata18/host17/.*/block/' | head -1` could work - though, probably won't work since you can't find anything in `/sys/block` ... have you considered this error may result in the device not being assigned to a "drive"? Perhaps search dmesg (and logs) for `ata18.00` to get more info – Jaromanda X Jul 02 '23 at 23:47
  • @JaromandaX I'm not sure what ATA device there would be that wasn't assigned to a block device. There are only 8 SATA hard drives in the machine physically, no optical drives, and there are 8 block devices recognized by Linux – Dave Butler Jul 03 '23 at 00:40
  • @JaromandaX The dmesg logs are interleaved, so, its hard to see what is related to what – Dave Butler Jul 03 '23 at 00:49
  • yet, a `grep` for ata18.00 would only show information for ata18.00 ... not sure how interleaving would be an issue – Jaromanda X Jul 03 '23 at 00:53
  • @JaromandaX sorry, I thought maybe context from nearby lines (not containing `ata18`) would be required... anyway, I have updated the question with a full grep... `ATAPI: MARVELL VIRTUALL` seems suspicious... – Dave Butler Jul 03 '23 at 01:11
  • not suspicious if you search for it - it seems to be some sort of RAID controller – Jaromanda X Jul 03 '23 at 02:35
  • ok, well, @JaromandaX well, you have basically helped me find a direct answer to this question, which put simply is: "There is no block device for this ATA port, to see what device it is, grep dmesg" But really we are left with a different question, which is why are we seeing these errors? AFAIK, nothing is using this "VIRTUALL" device... but I can dig into this.... Thanks – Dave Butler Jul 03 '23 at 03:35
  • Is it really with double L at the end? Anyway, perhaps the output of `dmidecode` or `lshw` could shed some light on what that device is – Jaromanda X Jul 03 '23 at 04:18

2 Answers2

1

Try this:

find /sys/class/ata_port/ata18/device/ -name block -exec ls {} +

Or this:

ls /sys/class/ata_port/ata18/device/host*/target*/*/block

I think you get the idea. There is a good chance that ata18 does not have a block device on it, in which case there will be no sdX on the system. Is that a DVD drive for example?

The following may give you a good idea what this is:

find /sys/class/ata_port/ata18/device/ -name model -exec cat {} +
chutz
  • 7,888
  • 1
  • 29
  • 59
0

You can use smartctl to quickly check all the disks. An example of using smartctl -H /dev/sdX or to get more advanced information smartctl -a /dev/sdX.

I would like to clarify if you have tried to do just ls -l /sys/block/sd* ?

Ivangelion
  • 16
  • 2
  • I added the output of the command you specified. I don't see anything related to ata18, or host 17 – Dave Butler Jul 03 '23 at 00:33
  • I also ran the `smartctl` commands on all of them, and they appear to has passed... I suspect that the problem isn't actually in the drive, but on the controller or a loose sata cable – Dave Butler Jul 03 '23 at 00:35
  • Try `systemctl -a` anyway and take a close look at the parameters `Reallocated_Sector_Ct`, `Reallocated_Event_Count`, `Current_Pending_Sector`, `Reported_Uncorrect `, `ATA Error Count` and to all the others, which by name seem suspicious to you – Ivangelion Jul 03 '23 at 01:03
  • Also useful information can be in the system logs (depends on your system). /var/log/messages or /var/log/syslog, also see if there are any messages from other devices in /var/log/dmesg. It seems to me that the problem is really in the cable or controller, you are right, so some disk is "reconnecting". – Ivangelion Jul 03 '23 at 01:09
  • I also found an [old thread](https://bbs.archlinux.org/viewtopic.php?id=197205) where a similar problem occurred. The most interesting comment is the last one. I think you should make sure that the problem is not in the Seagate HDD, which work properly, but report errors. – Ivangelion Jul 03 '23 at 01:18