I recently encountered a disk failure, for which I got a replacement disk, and successfully replaced the disk as well as completed the resilvering process. I was able to use the raidz2 without issues via samba, but then went to clear the errors as zfs prompted me to, and the command "froze" and had no output. I let it be over the night, and as it was still showing nothing in the morning, I terminated the command, which still didn't do anything, so I just closed down my putty ssh connection to it. After then, I was unable to access the data, import or import the pool no matter what i tried.
The system is on ubuntu(host name UBUNTU-SERVER-KVM), and I have another virtual ubuntu installed on it(host name ubuntu-server) which usually accesses the zfs via samba shares (I never figured out how to have it directly accessible in the virtual machines also). I suspect I may have done part of the zfs work on the main ubuntu-kvm, and part of it on the virtual ubuntu installation. I fear this caused my issues, and I'm unable to import the pool in any way. The ubuntu-kvm machine's zpool.cache show the "old" pool setup, with the dead drive and not the new one. Which seems to confirm that I mistakenly did the resilver on the virtual ubuntu machine, instead of the ubuntu-kvm that usually had the zpool.
I have a feeling that if I could somehow correct the paths the pool is trying to import, my data would still be there as the resilvering process completed and I was able to access it previously. The virtual ubuntu refers the its drives as "/dev/vd*" and the ubuntu-kvm hosting the virtual machine shows them as "/dev/sd*".
Anyone has any ideas what I can do next to recover the data? Yes, I do have the most critical parts backed up in the cloud, but there are plenty of other things lost that I'd much rather they weren't :)
Here's some info that's hopefully helpful:
import attempts
user@UBUNTU-SERVER-KVM ~> sudo zpool import
pool: tank
id: 3866261861707315207
state: FAULTED
status: One or more devices were being resilvered.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
config:
tank FAULTED corrupted data
raidz2-0 DEGRADED
sdc ONLINE
sdf ONLINE
replacing-2 DEGRADED
352909589583250342 OFFLINE
sde ONLINE
sda ONLINE
sdd ONLINE
sdb ONLINE
user@UBUNTU-SERVER-KVM ~> sudo zpool import -f tank
cannot import 'tank': I/O error
Destroy and re-create the pool from
a backup source.
user7@UBUNTU-SERVER-KVM ~> sudo zpool import -d /dev/disk/by-id/ tank
cannot import 'tank': I/O error
Destroy and re-create the pool from
a backup source.
zdb
user@UBUNTU-SERVER-KVM ~> sudo zdb
tank:
version: 5000
name: 'tank'
state: 0
txg: 15981041
pool_guid: 3866261861707315207
errata: 0
hostname: 'UBUNTU-SERVER-KVM'
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 3866261861707315207
children[0]:
type: 'raidz'
id: 0
guid: 2520364627045826300
nparity: 2
metaslab_array: 34
metaslab_shift: 38
ashift: 12
asize: 36006962135040
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 4891017201165304687
path: '/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-***-part1'
whole_disk: 1
DTL: 241
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 7457881130536207675
path: '/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-***-part1'
whole_disk: 1
DTL: 240
create_txg: 4
children[2]:
type: 'disk'
id: 2
guid: 352909589583250342
path: '/dev/vde1'
whole_disk: 1
not_present: 1
DTL: 159
create_txg: 4
children[3]:
type: 'disk'
id: 3
guid: 10598130582029967766
path: '/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-***-part1'
whole_disk: 1
DTL: 239
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 1949004718048415909
path: '/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-***-part1'
whole_disk: 1
DTL: 238
create_txg: 4
children[5]:
type: 'disk'
id: 5
guid: 13752847360965334531
path: '/dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-***-part1'
whole_disk: 1
DTL: 237
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
zdb -l /dev/sde1
user@UBUNTU-SERVER-KVM ~> sudo zdb -l /dev/sde1
--------------------------------------------
LABEL 0
--------------------------------------------
version: 5000
name: 'tank'
state: 0
txg: 15981229
pool_guid: 3866261861707315207
errata: 0
hostname: 'ubuntu-server'
top_guid: 2520364627045826300
guid: 1885359927145031384
vdev_children: 1
vdev_tree:
type: 'raidz'
id: 0
guid: 2520364627045826300
nparity: 2
metaslab_array: 34
metaslab_shift: 38
ashift: 12
asize: 36006962135040
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 4891017201165304687
path: '/dev/vdc1'
whole_disk: 1
DTL: 241
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 7457881130536207675
path: '/dev/vdf1'
whole_disk: 1
DTL: 240
create_txg: 4
children[2]:
type: 'replacing'
id: 2
guid: 9514120513744452300
whole_disk: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 352909589583250342
path: '/dev/vde1/old'
whole_disk: 1
not_present: 1
DTL: 159
create_txg: 4
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 1885359927145031384
path: '/dev/vde1'
whole_disk: 1
DTL: 231
create_txg: 4
resilver_txg: 15981226
children[3]:
type: 'disk'
id: 3
guid: 10598130582029967766
path: '/dev/vdg1'
whole_disk: 1
DTL: 239
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 1949004718048415909
path: '/dev/vdd1'
whole_disk: 1
DTL: 238
create_txg: 4
children[5]:
type: 'disk'
id: 5
guid: 13752847360965334531
path: '/dev/vdb1'
whole_disk: 1
DTL: 237
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
--------------------------------------------
LABEL 1
--------------------------------------------
version: 5000
name: 'tank'
state: 0
txg: 15981229
pool_guid: 3866261861707315207
errata: 0
hostname: 'ubuntu-server'
top_guid: 2520364627045826300
guid: 1885359927145031384
vdev_children: 1
vdev_tree:
type: 'raidz'
id: 0
guid: 2520364627045826300
nparity: 2
metaslab_array: 34
metaslab_shift: 38
ashift: 12
asize: 36006962135040
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 4891017201165304687
path: '/dev/vdc1'
whole_disk: 1
DTL: 241
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 7457881130536207675
path: '/dev/vdf1'
whole_disk: 1
DTL: 240
create_txg: 4
children[2]:
type: 'replacing'
id: 2
guid: 9514120513744452300
whole_disk: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 352909589583250342
path: '/dev/vde1/old'
whole_disk: 1
not_present: 1
DTL: 159
create_txg: 4
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 1885359927145031384
path: '/dev/vde1'
whole_disk: 1
DTL: 231
create_txg: 4
resilver_txg: 15981226
children[3]:
type: 'disk'
id: 3
guid: 10598130582029967766
path: '/dev/vdg1'
whole_disk: 1
DTL: 239
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 1949004718048415909
path: '/dev/vdd1'
whole_disk: 1
DTL: 238
create_txg: 4
children[5]:
type: 'disk'
id: 5
guid: 13752847360965334531
path: '/dev/vdb1'
whole_disk: 1
DTL: 237
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
--------------------------------------------
LABEL 2
--------------------------------------------
version: 5000
name: 'tank'
state: 0
txg: 15981229
pool_guid: 3866261861707315207
errata: 0
hostname: 'ubuntu-server'
top_guid: 2520364627045826300
guid: 1885359927145031384
vdev_children: 1
vdev_tree:
type: 'raidz'
id: 0
guid: 2520364627045826300
nparity: 2
metaslab_array: 34
metaslab_shift: 38
ashift: 12
asize: 36006962135040
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 4891017201165304687
path: '/dev/vdc1'
whole_disk: 1
DTL: 241
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 7457881130536207675
path: '/dev/vdf1'
whole_disk: 1
DTL: 240
create_txg: 4
children[2]:
type: 'replacing'
id: 2
guid: 9514120513744452300
whole_disk: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 352909589583250342
path: '/dev/vde1/old'
whole_disk: 1
not_present: 1
DTL: 159
create_txg: 4
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 1885359927145031384
path: '/dev/vde1'
whole_disk: 1
DTL: 231
create_txg: 4
resilver_txg: 15981226
children[3]:
type: 'disk'
id: 3
guid: 10598130582029967766
path: '/dev/vdg1'
whole_disk: 1
DTL: 239
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 1949004718048415909
path: '/dev/vdd1'
whole_disk: 1
DTL: 238
create_txg: 4
children[5]:
type: 'disk'
id: 5
guid: 13752847360965334531
path: '/dev/vdb1'
whole_disk: 1
DTL: 237
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
--------------------------------------------
LABEL 3
--------------------------------------------
version: 5000
name: 'tank'
state: 0
txg: 15981229
pool_guid: 3866261861707315207
errata: 0
hostname: 'ubuntu-server'
top_guid: 2520364627045826300
guid: 1885359927145031384
vdev_children: 1
vdev_tree:
type: 'raidz'
id: 0
guid: 2520364627045826300
nparity: 2
metaslab_array: 34
metaslab_shift: 38
ashift: 12
asize: 36006962135040
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 4891017201165304687
path: '/dev/vdc1'
whole_disk: 1
DTL: 241
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 7457881130536207675
path: '/dev/vdf1'
whole_disk: 1
DTL: 240
create_txg: 4
children[2]:
type: 'replacing'
id: 2
guid: 9514120513744452300
whole_disk: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 352909589583250342
path: '/dev/vde1/old'
whole_disk: 1
not_present: 1
DTL: 159
create_txg: 4
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 1885359927145031384
path: '/dev/vde1'
whole_disk: 1
DTL: 231
create_txg: 4
resilver_txg: 15981226
children[3]:
type: 'disk'
id: 3
guid: 10598130582029967766
path: '/dev/vdg1'
whole_disk: 1
DTL: 239
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 1949004718048415909
path: '/dev/vdd1'
whole_disk: 1
DTL: 238
create_txg: 4
children[5]:
type: 'disk'
id: 5
guid: 13752847360965334531
path: '/dev/vdb1'
whole_disk: 1
DTL: 237
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
EDIT1: Using a magical, undefined option that I was unable to determine what did, it seems i was able to import the pool, though zpool iostat shows an empty table, I cannot do a scrub as it says "pool is currently unavailable", history returns nothing, "zdb -u tank" returns "zdb: can't open 'tank': Input/output error", cannot detach the old dead harddrive as. The action has also changed from "cannot be imported due to damaged devices or data." to "Wait for the resilver to complete", but the resilver is going on at "1/s", and the config list shows none of the drives as resilvering. This has been going on for some days now without any changes in resilver count or percentage.
user@ubuntu-server ~> sudo zpool import -fFV
pool: tank
id: 3866261861707315207
state: FAULTED
status: One or more devices were being resilvered.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
config:
tank FAULTED corrupted data
raidz2-0 DEGRADED
vdc ONLINE
vdf ONLINE
replacing-2 DEGRADED
352909589583250342 OFFLINE
vde ONLINE
vdg ONLINE
vdd ONLINE
vdb ONLINE
user@ubuntu-server ~> sudo zpool import -fFV tank
user@ubuntu-server ~> sudo zpool status
pool: tank
state: FAULTED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Nov 1 21:02:15 2018
31.0T scanned out of 31.0T at 1/s, (scan is slow, no estimated time)
5.17T resilvered, 100.02% done
config:
NAME STATE READ WRITE CKSUM
tank FAULTED 0 0 1 corrupted data
raidz2-0 DEGRADED 0 0 6
vdc ONLINE 0 0 0
vdf ONLINE 0 0 0
replacing-2 DEGRADED 0 0 0
352909589583250342 OFFLINE 0 0 0 was /dev/vde1/old
vde ONLINE 0 0 0
vdg ONLINE 0 0 0
vdd ONLINE 0 0 0
vdb ONLINE 0 0 0
user@ubuntu-server /tank> sudo zpool detach tank 352909589583250342
cannot open 'tank': pool is unavailable