I just replaced a hard-drive, that was part of two different redundant pools, and now both pools are unavailable...
Details:
- There are four drives: 2x4TB (
da0
andada1
) and 2x3TB (da1
andda2
). - One pool is a RAIDZ1 consisting of both of the 3TB drives in their entireties and the 3TB-parts of the 4TB-drives.
- The other pool is a mirror consisting of the remaining space of the two bigger drives.
- I replaced one of the 4TB-drives with another of the same size (
da0
)...
I expected both pools to go into "degraded" mode until I spliced the replacement into the two parts and added each part to its pool.
Instead the computer rebooted unceremoniously and, upon coming back, both pools are "unavailable":
pool: aldan
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
aldan UNAVAIL 0 0 0
raidz1-0 UNAVAIL 0 0 0
1257549909357337945 UNAVAIL 0 0 0 was /dev/ada1p1
1562878286621391494 UNAVAIL 0 0 0 was /dev/da1
8160797608248051182 UNAVAIL 0 0 0 was /dev/da0p1
15368186966842930240 UNAVAIL 0 0 0 was /dev/da2
logs
4588208516606916331 UNAVAIL 0 0 0 was /dev/ada0e
pool: lusterko
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
lusterko UNAVAIL 0 0 0
mirror-0 UNAVAIL 0 0 0
623227817903401316 UNAVAIL 0 0 0 was /dev/ada1p2
7610228227381804026 UNAVAIL 0 0 0 was /dev/da0p2
I split the new drive now, but attempts to "zpool replace" are rebuffed with "pool is unavailable". I'm pretty sure, if I simply disconnect the new drive, both pools will become Ok (if degraded). Why are they both "unavailable" now? All of the devices are online, according to camcontrol:
<ATA TOSHIBA MG03ACA4 FL1A> at scbus0 target 0 lun 0 (pass0,da0)
<ATA Hitachi HUS72403 A5F0> at scbus0 target 1 lun 0 (pass1,da1)
<ATA TOSHIBA HDWD130 ACF0> at scbus0 target 2 lun 0 (pass2,da2)
<M4-CT128M4SSD2 0309> at scbus1 target 0 lun 0 (pass3,ada0)
<MB4000GCWDC HPGI> at scbus2 target 0 lun 0 (pass4,ada1)
The OS is FreeBSD-11.3-STABLE/amd64. What's wrong?
Update: no, I didn't explicitly offline
the device(s) before unplugging the disk -- and it is already on its way back to Amazon. I'm surprised, such offlining is necessary -- should not ZFS be able to handle the sudden death of any drive? And it shouldn't it, likewise, be prepared for a technician replacing the failed drive with another? Why is it throwing a fit like this?
I have backups and can rebuild the pools from scratch -- but I'd like to figure out, how to avoid doing this. Or, if not possible, to file a proper bug-report...
I unplugged the new drive completely, but the pool's status hasn't changed... Maybe, I need to reboot -- whether or not that helps, it is quite a disappointment.
Update 2: multiple reboots, with and without the new disk attached, did not help. However, zpool import
lists both pools just as I'd expect them: degraded (but available!). For example:
pool: lusterko
id: 11551312344985814621
state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
see: http://illumos.org/msg/ZFS-8000-2Q
config:
lusterko DEGRADED
mirror-0 DEGRADED
ada1p2 ONLINE
12305582129131953320 UNAVAIL cannot open
But zpool status
continues to insist, all devices are unavailable... Any hope?