2

Is it possible that zpool status is reporting the status for two physically different drives while printing the same name for both?

Specifically, I had a drive sdq listed in the spares list as FAULTED. But another sdq listed as ONLINE in raidz2-2. How can one drive be listed as FAULTED and ONLINE simultaneously? Or could it be two drives with the same device name but different serial numbers?

One hypothesis is that perhaps the old, faulted sdq was physically removed (although still present in zfs's data structures) and the a newly inserted drive was named sdq is now the one online?

Relatedly, how can I get zfs to tell me the serial numbers (or other identifiers) of each drive in zpool status?

Result of zpool status zfsstorage:

  pool: zfsstorage
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 896K in 38h15m with 0 errors on Mon May 11 14:39:46 2020
config:

    NAME        STATE     READ WRITE CKSUM
    zfsstorage  DEGRADED     0     0     0
      raidz2-0  ONLINE       0     0     0
        sda     ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sde     ONLINE       0     0     0
        sdf     ONLINE       0     0     0
      raidz2-1  ONLINE       0     0     0
        sdg     ONLINE       0     0     0
        sdh     ONLINE       0     0     0
        sdi     ONLINE       0     0     0
        sdj     ONLINE       0     0     0
        sdk     ONLINE       0     0     0
        sdl     ONLINE       0     0     0
      raidz2-2  ONLINE       0     0     0
        sdm     ONLINE       0     0     0
        sdae    ONLINE       0     0     0
        sdo     ONLINE       0     0     0
        sdp     ONLINE       0     0     0
        sdq     ONLINE       0     0     0
        sdr     ONLINE       0     0     0
      raidz2-3  ONLINE       0     0     0
        sds     ONLINE       0     0     0
        sdt     ONLINE       0     0     0
        sdu     ONLINE       0     0     0
        sdv     ONLINE       0     0     0
        sdw     ONLINE       0     0     0
        sdx     ONLINE       0     0     0
      raidz2-4  ONLINE       0     0     0
        sdy     ONLINE       0     0     0
        sdz     ONLINE       0     0     0
        sdaa    ONLINE       0     0     0
        sdab    ONLINE       0     0     0
        sdac    ONLINE       0     0     0
        sdad    ONLINE       0     0     0
      raidz2-6  DEGRADED     0     0     0
        sdak    ONLINE       0     0     1
        sdal    ONLINE       0     0     4
        sdam    DEGRADED     0     0    21  too many errors
        sdan    ONLINE       0     0     4
        sdao    ONLINE       0     0     0
        sdap    ONLINE       0     0     0
    logs
      mirror-5  ONLINE       0     0     0
        sdag    ONLINE       0     0     0
        sdah    ONLINE       0     0     0
    cache
      sdai      ONLINE       0     0     0
      sdaj      ONLINE       0     0     0
    spares
      sdaf      AVAIL
      sdq       FAULTED   corrupted data

This is with Debian 9.12 x86-64, zfs-dkms 0.6.5.9-5.

Andrew Straw
  • 123
  • 3
  • 1
    You should not create zpools using the `sd*` identifiers as these can change arbitrarily. Instead use the identifiers in `/dev/disk/by-id` which are guaranteed to remain the same for any particular drive, and which also makes it much easier to identify which specific drive you need to be looking at. – Michael Hampton Jun 20 '20 at 18:04
  • Yes, the vendor set the system up with the `sd*` identifiers and new drives we have added use `by-id`. (This is an older cut and paste from my logs, so the new drives are not showing up here.) – Andrew Straw Jun 20 '20 at 21:12
  • (which vendor?) – ewwhite Jun 21 '20 at 17:19
  • Based on @ewwhite's answer, I think this is actually my "fault" (or Debian's) and wouldn't want to name any particular vendor. I'm the one who asked them to install Debian and they kindly did so. – Andrew Straw Jun 24 '20 at 15:20

1 Answers1

2

This always happens to Debian and Ubuntu folks. It seems like those distributions have issues with SCSI enumeration and consistency across reboots and upgrades.

You can export the pool and import with zpool import -d /dev/disk/by-id and check the result.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • Thanks, that worked! (I did have to wait until I could take the pool offline.) Strangely the FAULTED status disappeared now, too. – Andrew Straw Jun 24 '20 at 15:19