Why are my ZFS pools "unavailable"?

Question

I just replaced a hard-drive, that was part of two different redundant pools, and now both pools are unavailable...

Details:

There are four drives: 2x4TB (da0 and ada1) and 2x3TB (da1 and da2).
One pool is a RAIDZ1 consisting of both of the 3TB drives in their entireties and the 3TB-parts of the 4TB-drives.
The other pool is a mirror consisting of the remaining space of the two bigger drives.
I replaced one of the 4TB-drives with another of the same size (da0)...

I expected both pools to go into "degraded" mode until I spliced the replacement into the two parts and added each part to its pool.

Instead the computer rebooted unceremoniously and, upon coming back, both pools are "unavailable":

      pool: aldan
     state: UNAVAIL
    status: One or more devices could not be opened.  There are insufficient
            replicas for the pool to continue functioning.
    action: Attach the missing device and online it using 'zpool online'.
       see: http://illumos.org/msg/ZFS-8000-3C
      scan: none requested
    config:

            NAME                      STATE     READ WRITE CKSUM
            aldan                     UNAVAIL      0     0     0
              raidz1-0                UNAVAIL      0     0     0
                1257549909357337945   UNAVAIL      0     0     0  was /dev/ada1p1
                1562878286621391494   UNAVAIL      0     0     0  was /dev/da1
                8160797608248051182   UNAVAIL      0     0     0  was /dev/da0p1
                15368186966842930240  UNAVAIL      0     0     0  was /dev/da2
            logs
              4588208516606916331     UNAVAIL      0     0     0  was /dev/ada0e

      pool: lusterko
     state: UNAVAIL
    status: One or more devices could not be opened.  There are insufficient
            replicas for the pool to continue functioning.
    action: Attach the missing device and online it using 'zpool online'.
       see: http://illumos.org/msg/ZFS-8000-3C
      scan: none requested
    config:

            NAME                     STATE     READ WRITE CKSUM
            lusterko                 UNAVAIL      0     0     0
              mirror-0               UNAVAIL      0     0     0
                623227817903401316   UNAVAIL      0     0     0  was /dev/ada1p2
                7610228227381804026  UNAVAIL      0     0     0  was /dev/da0p2

I split the new drive now, but attempts to "zpool replace" are rebuffed with "pool is unavailable". I'm pretty sure, if I simply disconnect the new drive, both pools will become Ok (if degraded). Why are they both "unavailable" now? All of the devices are online, according to camcontrol:

<ATA TOSHIBA MG03ACA4 FL1A>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA Hitachi HUS72403 A5F0>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA TOSHIBA HDWD130 ACF0>         at scbus0 target 2 lun 0 (pass2,da2)
<M4-CT128M4SSD2 0309>              at scbus1 target 0 lun 0 (pass3,ada0)
<MB4000GCWDC HPGI>                 at scbus2 target 0 lun 0 (pass4,ada1)

The OS is FreeBSD-11.3-STABLE/amd64. What's wrong?

Update: no, I didn't explicitly offline the device(s) before unplugging the disk -- and it is already on its way back to Amazon. I'm surprised, such offlining is necessary -- should not ZFS be able to handle the sudden death of any drive? And it shouldn't it, likewise, be prepared for a technician replacing the failed drive with another? Why is it throwing a fit like this?

I have backups and can rebuild the pools from scratch -- but I'd like to figure out, how to avoid doing this. Or, if not possible, to file a proper bug-report...

I unplugged the new drive completely, but the pool's status hasn't changed... Maybe, I need to reboot -- whether or not that helps, it is quite a disappointment.

Update 2: multiple reboots, with and without the new disk attached, did not help. However, zpool import lists both pools just as I'd expect them: degraded (but available!). For example:

   pool: lusterko
     id: 11551312344985814621
  state: DEGRADED
 status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://illumos.org/msg/ZFS-8000-2Q
 config:

        lusterko                  DEGRADED
          mirror-0                DEGRADED
            ada1p2                ONLINE
            12305582129131953320  UNAVAIL  cannot open

But zpool status continues to insist, all devices are unavailable... Any hope?

score 1 · Answer 1 · answered Oct 21 '19 at 12:16

1

Maybe also you did not offline the old drive prior to removing it. (It's a possibility that ZFS thinks that the logical drives (your pools) are corrupted, and the controller thinks they are fine. This happens if there's a difference in disk cylinder size - rare case but can happen.)

To get out of the situation:

get the name of the disk from zpool status
use diskinfo to identify the physical location of the UNAVAILABLE drive noted from above
reconfigure it with cfgadm -c unconfigure and cfgadm -c configure
bring the new disk online - zpool online zone
update zone - zpool replace zone (zpool status zone should show online)
run the zpool replace command to replace the disk

answered Oct 21 '19 at 12:16

Overmind

3,076
2
16
25

What is "cfgadm"? – Jim L. Oct 21 '19 at 22:17
`cfgadm` is a SunOS utility... I wish, @Overmind would use the FreeBSD's equivalent in his otherwise nicely-detailed answer... – Mikhail T. Oct 22 '19 at 00:56
You can use atacontrol; don't mind the actual command name as they can vary; take the logic into consideration, the actual proper steps to take. – Overmind Oct 23 '19 at 07:02
Even unplugging the new drive completely does not help -- both pools remain "`UNAVAIL`" and attempts to change them in any way are met with "pool is unavailable" message. – Mikhail T. Nov 06 '19 at 14:05

Why are my ZFS pools "unavailable"?

1 Answers1