I have the following zpool:
NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000cca266f3d8ee ONLINE 0 0 0
wwn-0x5000cca266f1ae00 ONLINE 0 0 0
This morning the host experienced an event (still digging into it. Load was very high and lots of stuff wasn't working, but I could still get into it).
On reboot the host hung during boot waiting on services that relied on data on the above pool.
suspecting an issue with the pool, I removed one of the drives and rebooted again. Host came online this time.
A scrub showed all the data on the existing disk was fine. After that completed, I reinserted the drive that was removed. The drive began resilvering, but gets about 4% through and then restarts.
smartctl shows no issues with either drive (No errors logged, WHEN_FAILED empty).
However, I can't tell which disk is resilvering, and in fact it looks like the pool is fine and doesn't need resilvered at all.
errors: No known data errors
root@host1:/var/log# zpool status
pool: zfspool
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Dec 8 12:20:53 2019
46.7G scanned at 15.6G/s, 45.8G issued at 15.3G/s, 5.11T total
0B resilvered, 0.87% done, 0 days 00:05:40 to go
config:
NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000cca266f3d8ee ONLINE 0 0 0
wwn-0x5000cca266f1ae00 ONLINE 0 0 0
errors: No known data errors
What is the best course to get out of this resilvering loop? Other answers suggest detaching the drive that is being resilvered, but like I said, it doesn't look like either one is.
edit:
zpool events is about 1000 of the following repeated:
Dec 8 2019 13:22:12.493980068 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "zfspool"
pool_guid = 0x990e3eff72d0c352
pool_state = 0x0
pool_context = 0x0
time = 0x5ded4d64 0x1d7189a4
eid = 0xf89
Dec 8 2019 13:22:12.493980068 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "zfspool"
pool_guid = 0x990e3eff72d0c352
pool_state = 0x0
pool_context = 0x0
history_hostname = "host1"
history_internal_str = "func=2 mintxg=7381953 maxtxg=9049388"
history_internal_name = "scan setup"
history_txg = 0x8a192e
history_time = 0x5ded4d64
time = 0x5ded4d64 0x1d7189a4
eid = 0xf8a
Dec 8 2019 13:22:17.485979213 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "zfspool"
pool_guid = 0x990e3eff72d0c352
pool_state = 0x0
pool_context = 0x0
history_hostname = "host1"
history_internal_str = "errors=0"
history_internal_name = "scan aborted, restarting"
history_txg = 0x8a192f
history_time = 0x5ded4d69
time = 0x5ded4d69 0x1cf7744d
eid = 0xf8b
Dec 8 2019 13:22:17.733979170 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "zfspool"
pool_guid = 0x990e3eff72d0c352
pool_state = 0x0
pool_context = 0x0
history_hostname = "host1"
history_internal_str = "errors=0"
history_internal_name = "starting deferred resilver"
history_txg = 0x8a192f
history_time = 0x5ded4d69
time = 0x5ded4d69 0x2bbfa222
eid = 0xf8c
Dec 8 2019 13:22:17.733979170 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "zfspool"
pool_guid = 0x990e3eff72d0c352
pool_state = 0x0
pool_context = 0x0
time = 0x5ded4d69 0x2bbfa222
eid = 0xf8d
...