2

i had trying to add new HDD in place of Falty HDD. but new HDD can not sync with old one .sync process shown up to 30 % after that its stopped .

cat /proc/mdstat
Personalities : [raid1] 

md2 : active raid1 sda3[0] sdb3[2](S)
      1458319504 blocks super 1.0 [2/1] [U_]

md1 : active raid1 sda2[3] sdb2[2]
      524276 blocks super 1.0 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[2]
      6291444 blocks super 1.0 [2/2] [UU]

md0 and md1 sync successfully , but md2 can not

this is detail

mdadm --detail /dev/md2
/dev/md2:
        Version : 1.0
  Creation Time : Fri May 24 11:22:21 2013
     Raid Level : raid1
     Array Size : 1458319504 (1390.76 GiB 1493.32 GB)
  Used Dev Size : 1458319504 (1390.76 GiB 1493.32 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Aug  4 22:08:23 2014
          State : clean, degraded 
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

           Name : rescue:2  (local to host rescue)
           UUID : 96b46a6c:f520938c:f94879df:27851e8a
         Events : 616

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       0        0        1      removed

       2       8       19        -      spare   /dev/sdb3

is that any solution . i want to backup my data

Bhavesh
  • 131
  • 6
  • http://superuser.com/q/429776/144961 – Michael Hampton Aug 05 '14 at 02:11
  • Does `dmesg|tail -20` reveal anything? – MadHatter Aug 05 '14 at 10:57
  • `sd 0:0:0:0: [sda] Unhandled sense code sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Add. Sense: Unrecovered read error - auto reallocate failed md/raid1:md2: sda: unrecoverable I/O read error for block 896543360 md: md2: recovery interrupted. md: recovery of RAID array md1 md: using 128k window, over a total of 524276k. RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sda3 disk 1, wo:1, o:1, dev:sdb3 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sda3 ` – Bhavesh Aug 06 '14 at 10:53

3 Answers3

1

Sorry for late arrival. So, I am suprised nobody answered this. There is even a link to a similar problem, but I doubt cables are in play in this case.

You started a sync to a new disk, but when sync went to 30%, the source (the last drive left that has all the data) encountered the read error. In case of read errors Linux MD RAID driver queries reads from other component devices, but in this case there is no synched component device to read from so it gives up. It'll stop sync on first such unrecoverable error and then restart a sync from the start. Of course, pulling spare out and re-adding it won't help. You have to use other ways to complete the sync or otherwise retrieve (slightly corrupted) data in such case.

The system might work perfectly, because this sector may not contain any data so it never tried to read from in during normal operation, but RAID sync is a special case, where it reads everything. We call such cases a silent bad blocks.

The first idea is to force drive to remap the bad block internally. Unfortunanely there is impossible to do this with guarantee, but there is a high chance that if you write this particular sector, it'll get remapped and then read back successfully. To do that, one can use a hdparm utility (notice --repair-sector is a alias for --write-sector):

hdparm --write-sector 448271680

I deliberately put almost a random number here. That's 896543360/2, where the big number was taken from dmesg error message. You have to calculate it yourself for your case. Be extremly careful. I suggest to do a read check (--read-sector) with the same number, to trigger the same error message and therefore to prove this is indeed the right sector. Note, you will lose anything in this sector, but it is unreadable anyway, so it is already essentially lost, and if it is silent, there was no useful information.

Repeat this for all unreadable blocks. You'll need to replace this drive too, when sync is complete.

Other way to help the situation requires service stopping for an extended period of time. You need to stop the faulty RAID and run ddrescue from faulty disk to a new disk. After that, you neen first to remove old device completely and start the system from a new disk (with degraded arrays, I know). Then, if it works, add another new disk and complete the sync.

In case you wondered, I've happened to do successful repairs both ways.

The lesson here is: just having a RAID is not enough; for data to be safe you need to monitor your array health, scrube it periodically (i.e. perform a read check for all devices and compare — to be sure every block gets read) and, of course, take required actions timely. Hardware RAIDs also have capabilities to set up automatic periodic scrubbing. For each MD RAID, you should do once a month:

echo check >> /sys/block/md0/md/sync_action

(Debian has this by default, AFAIK). So when some disk gets silent unreadable sector, in a month you'll discover that. Then don't forget to replace the dying disk as soon as possible!

Nikita Kipriyanov
  • 10,947
  • 2
  • 24
  • 45
0

The mdadm switch "grow" should pull the spare into the array. Something like "#mdadm --grow /dev/sdb3 --raid-devices=3" If that fails, I'd tail syslog to find out why.

0
mdadm --manage /dev/md2 --add /dev/sdb3

This should do the job,

/dev/sdb3 is still marked as Spare, hence the (S).

If this is not enough you can either: remove it and try re-adding it:

mdadm --manage /dev/md2 --remove /dev/sdb3

You may want to stop and restart the array:

   mdadm --stop /dev/md2 ; mdadm --start /dev/md2

And your last option would be to force the resync (don't worry, it's not destructive) :

mdadm --assemble --run --force --update=resync /dev/md2 /dev/sda3 /dev/sdb3

Also, just restarting the array is mostly enough to do the job without further hassle. And there is more: You can even re-create the whole thing with mdadm --create. ;)

runlevel0
  • 103
  • 4