1

I have zfsonlinux (Centos 7), and raidz1. And I have trouble: two disks are dying.

The first disk have Raw_Read_Error_Rate failure in SMART and Reallocated_Sector_Ct error (the disk has run out of free sectors to relocate data)

The second disk has Reallocated_Sector_Ct errors, but still has backup sectors.

I've changed the first disk to the new one, and zfs has started resilvering. At the beginning the speed was ~2MB/s, but after some time it dropped to 20KB/s and even less and stays very small for several days!

And there are more errors:

Jul  9 06:14:09 shaggycat-desktop smartd[966]: Device: /dev/sdf [SAT], FAILED SMART self-check. BACK UP DATA NOW!
Jul  9 06:14:11 shaggycat-desktop smartd[966]: Device: /dev/sdf [SAT], 488 Currently unreadable (pending) sectors
Jul  9 06:14:11 shaggycat-desktop smartd[966]: Device: /dev/sdf [SAT], 107 Offline uncorrectable sectors
Jul  9 06:44:08 shaggycat-desktop smartd[966]: Device: /dev/sdf [SAT], FAILED SMART self-check. BACK UP DATA NOW!
Jul  9 06:44:12 shaggycat-desktop smartd[966]: Device: /dev/sdf [SAT], 488 Currently unreadable (pending) sectors
Jul  9 06:44:12 shaggycat-desktop smartd[966]: Device: /dev/sdf [SAT], 107 Offline uncorrectable sectors

reboot, and import pool don't help.

Can I use dd_rescue to copy second failed disk to the new one, and deceive zpool? How I can deceive it, and import pool with new disk? I use /dev/disk/by-id/ to identify disks in my zpool.

  pool: tank                                                                                                                                                                                                         
 state: DEGRADED                                                                                                                                                                                                     
status: One or more devices is currently being resilvered.  The pool will                                                                                                                                            
        continue to function, possibly in a degraded state.                                                                                                                                                          
action: Wait for the resilver to complete.                                                                                                                                                                           
  scan: resilver in progress since Sun Jul  5 15:16:17 2015                                                                                                                                                          
    59.2G scanned out of 1.70T at 81.3K/s, (scan is slow, no estimated time)                                                                                                                                         
    14.8G resilvered, 3.40% done                                                                                                                                                                                     
config:                                                                                                                                                                                                              

        NAME                                                  STATE     READ WRITE CKSUM                                                                                                                             
        tank                                                  DEGRADED     0     0     0                                                                                                                             
          raidz1-0                                            DEGRADED     0     0     0                                                                                                                             
            ata-Hitachi_HDS721010CLA332_JP2940HQ2VTTDH-part1  ONLINE       0     0     0                                                                                                                             
            replacing-1                                       DEGRADED     0     0     1                                                                                                                             
              4455585976361728304                             UNAVAIL      0     0     0  was /dev/disk/by-id/ata-Hitachi_HDS721010CLA332_JP2940HQ2VTZUH-part1                                                       
              ata-ST1000DM003-1ER162_W4Y1HJTP-part1           ONLINE       0     0     0  (resilvering)                                                                                                              
            ata-WDC_WD10EALS-00Z8A0_WD-WCATR1714802-part1     ONLINE       0     0     0                                                                                                                             
            ata-WDC_WD10EALS-00Z8A0_WD-WCATR1737637-part1     ONLINE       0     0     0      


zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank  3.56T  1.70T  1.86T         -      -    47%  1.54x  DEGRADED  -

I use these software versions: zfs-release-1-2.el7.centos.noarch libzfs2-0.6.4.1-1.el7.centos.x86_64 zfs-0.6.4.1-1.el7.centos.x86_64 zfs-dkms-0.6.4.1-1.el7.centos.noarch

1 Answers1

1

EDIT: i first thought it was mirror pool, not raidz.

Firstly, zpool replace should work fine. If it is slow because that dying disk is acting slow, you can offline/detach it first so data is rebuilt from other disks and reads are not attempted from the bad disk. Multiple failing disks is not a good thing though.

Exporting pool, using ddrescue and then import should also work, as long as the old dead disk is removed from the machine at that point. Import generally looks at all disks to see what pools are there to be found and imported.

And, if you have multiple failing drives, there is nothing wrong with replacing them at the same time, it's generally faster as it only takes one pass of resilver to do all the disks that way. When you replace an online disk, it is still being used as a read/write target until the replace completes.

stox
  • 41
  • 4