2

I believe I messed up in replacing a failed drive in my ZFS RaidZ2 pool. I think I forgot to offline the failed drive first and ran the replace command and now it seems to have created a temporary mirrored pool. Any tips on how I can correct this without destroying the pool?

NAME                                 STATE     READ WRITE CKSUM  
storage                              DEGRADED     0     0     1  
  raidz2-0                           DEGRADED     0     0     2  
    replacing-0                      DEGRADED     0     0     0  
      10188385608277313659           UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F4X76S-part1  
      sda                            ONLINE       0     0     0  
    ata-ST3000DM001-1CH166_W1F5C09L  ONLINE       0     0     0  
    ata-ST3000DM001-1CH166_Z1F4C9ZF  ONLINE       0     0     0  
    ata-ST3000DM001-1CH166_Z1F50YNJ  ONLINE       0     0     0  
    sde                              ONLINE       0     0     0  
    sdf                              ONLINE       0     0     0  

Here is the dull summary of zpool status -v

  pool: storage
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 131G in 11h41m with 1 errors on Sat Mar 16 09:33:47 2019
config:

        NAME                                 STATE     READ WRITE CKSUM
        storage                              DEGRADED     0     0     1
          raidz2-0                           DEGRADED     0     0     2
            replacing-0                      DEGRADED     0     0     0
              10188385608277313659           UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F4X76S-part1
              sda                            ONLINE       0     0     0
            ata-ST3000DM001-1CH166_W1F5C09L  ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F4C9ZF  ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F50YNJ  ONLINE       0     0     0
            sde                              ONLINE       0     0     0
            sdf                              ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        storage:/plex_db_backup/com.plexapp.plugins.library.db-2017-09-21
Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
JCDenton
  • 21
  • 3
  • Please show the commands you ran. Can you do a `zpool history storage`? – ewwhite Mar 17 '19 at 14:46
  • I should mention that I physically removed the dead drive first - put in the new drive and ran the following command - $ zpool replace storage 10188385608277313659 /dev/sda -f – JCDenton Mar 17 '19 at 14:48
  • Just run `zpool clear storage`. – ewwhite Mar 17 '19 at 15:02
  • Warning: You should not add disks to a zpool using `/dev/sd*`. These device names may change and you may end up with a broken zpool later. Use the names in `/dev/disk/by-id` instead, as you have already done with some of the vdevs. – Michael Hampton Mar 17 '19 at 15:27

1 Answers1

0

This may just be the replacement and resilvering process running.

Please show the full output of zpool status -v.

Okay, just run zpool clear storage.

Make sure that the zpool status reflects resilvering after that.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • Thank you. I have tried running that in the past to no avail. It clears the checksum errors and starts resilvering but, once it is done it still keeps the old drive there saying unavailable. (PS - I have just run the clear and it is now resilvering - 12Tb raid so it will be about 12 hours until done) – JCDenton Mar 17 '19 at 15:05
  • The duplicate drive entry goes away when the disk is done resilvering. – ewwhite Mar 17 '19 at 17:04
  • Well the resilvering is complete and the pool is the same as above (resilvered 131G in 11h39m with 1 errors on Sun Mar 17 20:42:12 2019) – JCDenton Mar 18 '19 at 21:07
  • You can remove the file indicated in the error. Do you have a replacement? – ewwhite Mar 18 '19 at 22:22
  • The file is not there. It was in a dataset that I destroyed that I think I did as one of my drives failed. I unfortunately do not have another replacement right now either – JCDenton Mar 19 '19 at 02:14