4

I have a zpool thats now showing in a FAULTED state after a power outage rebooted the server & devices names got thrown around. I understand I messed up by adding vdevs by device name, but didnt know it could be changed after creation & populating the pool. Currently the pool is looking for '/dev/sde' (instead of '/dev/sdc') and '/dev/sde' is the device the root fs is on, and im not sure how to approach any recovery without potentially making matters worse.

Hoping there is a way to simply reconfig the zpool to use 'sdc' to be able to export and reimport with 'by-id' naming for avoiding this in future.

root@boxey:~# zpool status
  pool: data
 state: FAULTED
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from
        a backup source.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        UNAVAIL      0     0     0  insufficient replicas
          raidz1-0  UNAVAIL      0     0     0  insufficient replicas
            sda     ONLINE       0     0     0
            sde     UNAVAIL      0     0     0
            sdd     FAULTED      0     0     0  corrupted data
            sdb     FAULTED      0     0     0  corrupted data


root@boxey:~# lsblk --output=NAME,FSTYPE,LABEL,SIZE,UUID
NAME   FSTYPE     LABEL        SIZE UUID
sda                            2.7T
├─sda1 zfs_member data         2.7T 166412156792699288
└─sda9                           8M
sdb                            2.7T
├─sdb1 zfs_member data         2.7T 166412156792699288
└─sdb9                           8M
sdc                            2.7T
├─sdc1 zfs_member data         2.7T 166412156792699288
└─sdc9                           8M
sdd                            2.7T
├─sdd1 zfs_member data         2.7T 166412156792699288
└─sdd9                           8M
sde                           55.9G
├─sde1 ext4                   53.6G f2d9733a-846d-48c2-bb63-c7f4e0345ad5
├─sde2                           1K
└─sde5 swap                    2.3G 6c34cadd-db42-4e14-a647-733e021c018e

root@boxey:~# zpool export -f data
cannot export 'data': one or more devices is currently unavailable

root@boxey:~# zdb
data:
    version: 5000
    name: 'data'
    state: 0
    txg: 23300202
    pool_guid: 166412156792699288
    errata: 0
    hostid: 8323329
    hostname: 'boxey'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 166412156792699288
        children[0]:
            type: 'raidz'
            id: 0
            guid: 1294813595973345307
            nparity: 1
            metaslab_array: 34
            metaslab_shift: 36
            ashift: 12
            asize: 12002313371648
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 39
            children[0]:
                type: 'disk'
                id: 0
                guid: 4873892069497714664
                path: '/dev/sda1'
                whole_disk: 1
                DTL: 101
                create_txg: 4
                com.delphix:vdev_zap_leaf: 40
            children[1]:
                type: 'disk'
                id: 1
                guid: 16241503886070383904
                path: '/dev/sde1'
                whole_disk: 1
                DTL: 100
                create_txg: 4
                com.delphix:vdev_zap_leaf: 41
            children[2]:
                type: 'disk'
                id: 2
                guid: 1910545688695459106
                path: '/dev/sdd1'
                whole_disk: 1
                DTL: 99
                create_txg: 4
                com.delphix:vdev_zap_leaf: 42
            children[3]:
                type: 'disk'
                id: 3
                guid: 2766829802450425996
                path: '/dev/sdb1'
                whole_disk: 1
                DTL: 98
                create_txg: 4
                com.delphix:vdev_zap_leaf: 43
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
tango
  • 43
  • 3

1 Answers1

8

If the issue is related to device name only, you can re-import the pool with a stable name by issuing:

zpool import data -d /dev/disk/by-id/

However, you have two disk with FAULTED/corrupted data, which let me think you somewhat messed with the data themselves, rather than "only" with the real device name.

Please try the command suggested above and share any other output.

EDIT: the OP commented below:

zpool import data -d /dev/disks/by-id/ cannot import 'data': a pool with that name already exists use the form 'zpool import <pool | id> ' to give it a new name

root@boxey:~# zpool import data data2 -t cannot import 'data': no such pool available

This means the FAULTED vdev where discovered as such (by the system) during normal pool operation. Please do the following:

  • remove the cache file /etc/zfs/zpool.cache
  • reboot the system
  • show the output of zpool status before trying any import
  • if zpool status shows no pool imported, try issuing zpool import data -d /dev/disk/by-id/
Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • 3
    I'd rather use `/dev/disk/by-id` as the paths can also change if drives are moved around in the chassis. – Michael Hampton Feb 16 '21 at 13:44
  • @MichaelHampton I generally prefer to use `by-path` because it is easier identifying any disk even if the WWID label is removed/lost. But sure, if drive slot/cable where mangled `by-id` should be better. I will update my answer. – shodanshok Feb 16 '21 at 13:46
  • root@boxey:~# zpool import data -d /dev/disks/by-id/ cannot import 'data': a pool with that name already exists use the form 'zpool import ' to give it a new name root@boxey:~# zpool import data data2 -t cannot import 'data': no such pool available - I havent messed with the data set at all, been reading alot of options before making this question. Can only think the corruption may have occurred was when the electrician dropped power. Was hoping it was due to a device being missing may have been causing this report – tango Feb 16 '21 at 13:49
  • @tango I've updated my answer, check the new instructions. – shodanshok Feb 16 '21 at 13:58
  • 1
    I've updated it also, as there was a typo. – Michael Hampton Feb 16 '21 at 14:08
  • @shodanshok That totally worked. Pool online with no data errors. Only extra thing I did was remove an automounting rc.d script prior to reboot. Thankyou. – tango Feb 16 '21 at 14:23
  • 1
    @tango excellent! Be sure to scrub your pool as soon as possible – shodanshok Feb 16 '21 at 14:41
  • @shodanshok Good idea, scrub is underway – tango Feb 16 '21 at 14:45