Why mdadm's RAID-1 is resyncing from scratch despite of the bitmap?

Question

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 nbd0[3] sda10[0]
      53246315 blocks super 1.2 [3/1] [U__]
      [>....................]  recovery =  1.0% (537088/53246315) finish=203.0min speed=4326K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

# nbd-client -d /dev/nbd0
Disconnecting: que, disconnect, sock, done

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 nbd0[3](F) sda10[0]
      53246315 blocks super 1.2 [3/1] [U__]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

# mdadm /dev/md/raidy --remove /dev/nbd0
mdadm: hot removed /dev/nbd0 from /dev/md/raidy

# nbd-client 10.99.99.250 7777 /dev/nbd0
Negotiation: ..size = 53247411KB
bs=1024, sz=53247411

# mdadm --incremental --run /dev/nbd0
mdadm: /dev/nbd0 attached to /dev/md/raidy which is already active.

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 nbd0[3] sda10[0]
      53246315 blocks super 1.2 [3/1] [U__]
      [>....................]  recovery =  0.0% (31616/53246315) finish=196.2min speed=4516K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

# uname -a
Linux vi-notebook 2.6.35-zen2-08220-g2c56b9e #14 ZEN PREEMPT Thu Oct 21 02:48:18 EEST 2010 i686 GNU/Linux

# mdadm --version
mdadm - v3.1.4 - 31

How to properly disconnect and reconnect the device to RAID-1 to take advantage of the write-intent bitmap?

Experimenting again:

Personalities : [raid1] 
md0 : active raid1 nbd0[3] sda10[0]
      53246315 blocks super 1.2 [3/2] [UU_]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

    /dev/md/raidy:
        Version : 1.2
  Creation Time : Tue Mar 30 05:42:53 2010
     Raid Level : raid1
     Array Size : 53246315 (50.78 GiB 54.52 GB)
  Used Dev Size : 53246315 (50.78 GiB 54.52 GB)
   Raid Devices : 3
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Jan 31 18:18:03 2011
          State : active, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : vi-notebook:0  (local to host vi-notebook)
           UUID : bc325b24:fa0a4957:47820c56:fc818fa3
         Events : 2661551

    Number   Major   Minor   RaidDevice State
       0       8       10        0      active sync   /dev/sda10
       3      43        0        1      active sync   /dev/nbd0
       2       0        0        2      removed

Now removing one of the devices:

# mdadm /dev/md/raidy --fail /dev/nbd0 
mdadm: set /dev/nbd0 faulty in /dev/md/raidy
# mdadm /dev/md/raidy --remove /dev/nbd0 
mdadm: hot removed /dev/nbd0 from /dev/md/raidy

Now re-adding it:

mdadm --incremental --run /dev/nbd0

It starts resyncing from the beginning:

Personalities : [raid1] 
md0 : active raid1 nbd0[3] sda10[0]
      53246315 blocks super 1.2 [3/1] [U__]
      [>....................]  recovery =  0.4% (244480/53246315) finish=289.5min speed=3050K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>


/dev/md/raidy:
        Version : 1.2
  Creation Time : Tue Mar 30 05:42:53 2010
     Raid Level : raid1
     Array Size : 53246315 (50.78 GiB 54.52 GB)
  Used Dev Size : 53246315 (50.78 GiB 54.52 GB)
   Raid Devices : 3
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Jan 31 18:22:07 2011
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           Name : vi-notebook:0  (local to host vi-notebook)
           UUID : bc325b24:fa0a4957:47820c56:fc818fa3
         Events : 2661666

    Number   Major   Minor   RaidDevice State
       0       8       10        0      active sync   /dev/sda10
       3      43        0        1      spare rebuilding   /dev/nbd0
       2       0        0        2      removed

What is the output of mdadm -D /dev/md0 ? – Steven Jan 09 '11 at 00:53 — Steven, Jan 09 '11 at 00:53
http://sprunge.us/HRgb (now not resyncing) – Vi. Jan 10 '11 at 18:14 — Vi., Jan 10 '11 at 18:14

score 3 · Answer 1 · answered Feb 26 '14 at 19:42

You should use "--re-add" to add the removed disk back, like this:

# mdadm /dev/md0 --re-add /dev/sdf2

I just tried it, and it worked, without a rebuild - if the disk has been removed beforehand using "--remove", like you did.

Note that this is important - the disk has to have been removed using "--remove". If you just pull a disk out hard, plug it back in and try to "--re-add" it, you will get

mdadm: --re-add for /dev/sdf2 to /dev/md0 is not possible

Why is that? Looking at http://linux.die.net/man/8/mdadm, section "--re-add":

If [...] the slot that it used is still vacant, then the device will be added back to the array in the same position.

If you just pulled the disk out, the slot will still be occupied by the failed disk (marked F in /proc/mdstat):

$ cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 loop0[0] loop3[2](F)
      1047552 blocks super 1.2 [2/1] [U_]
      bitmap: 1/1 pages [4KB], 65536KB chunk

Remove the failed disk and the "--re-add" will work, taking advantage of the bitmap.

You probably using only 2 disks. I was using 3 devices, of which one was always online and two others were only available periodically. For this use case you need to have two bitmaps (track sectors pending to be written to the second device and the third). With only one bitmap you can't efficiently re-add a device to always-degraded RAID. Multiple bitmaps is a feature that is unlikely to be implemented - it is optimisation for use case when RAID stays degraded for prolonged time. — Vi., Feb 27 '14 at 14:09

score 2 · Accepted Answer · answered Jan 28 '11 at 18:31

2

Hrm. Looking at your above outputs, it isn't clear that you had multiple sync'ed disks in the first place. It looks like there was a failed disk that was syncing, which was removed, then re-added, then was resyncing again. At no point do I ever see a display that shows 2 disks fully sync'ed.

I would make sure sure both disks are active, let them fully sync up together, and only after verifying that would I attempt to remove a disk and re-add it.

Note that the write intent bitmap is only an aid for quickly resyncing two disks which are already nearly in sync (i.e. after a system crash, or when a disk is removed for a short period of time). It isn't a intended for long term removal, or for disks that aren't already fully sync'ed.

answered Jan 28 '11 at 18:31

Christopher Cashell

9,128
2
32
44

OK, I'll experiment with full resync and immediate remove-add. – Vi. Jan 31 '11 at 01:36
Can't write-intent bitmap be used to help, for example, the _unreliable_ re-syncronisation (it interrupts often)? – Vi. Jan 31 '11 at 01:37
I believe it would help during resynchronization, but only after a full sync. My understanding of the write-intent bitmap is that once the disks are all fully synced, it tracks where writes are being made, and regularly flushes itself after periods of inactivity (when the disks are all bit-for-bit synced). If a disk is removed, the write-intent bitmap is marked so that when/if the disk is replaced, only those changes made since it's removal need to be synced. If a disk is removed before full sync, I think the bitmap will still be used to try to complete the sync. – Christopher Cashell Jan 31 '11 at 16:07
Experimented again, after full sync up together removed and immediately re-added. The same: it start from the beginning. – Vi. Jan 31 '11 at 16:27

score 0 · Answer 3 · edited May 06 '11 at 12:47

0

I am not sure if this helps - but I think your problem seems to be the nbd-device.

If you want to RAID1 across an IP-network - why don't you use drbd?

edited May 06 '11 at 12:47

Vi.

841
11
19

answered May 05 '11 at 19:35

Nils

7,695
3
34
73

I the problem is that there are multiple removed parts and only one write-intent bitmap, so if one of parts is almost constantly missing, write-intent bitmap is not cleaned while resyncing. Solution will be separate write-intent bitmap for each of missing part. – Vi. May 06 '11 at 11:24
With non-nbd thing it is the same. – Vi. May 06 '11 at 11:24
Can DRBD 1. Handle multiple local data sources (like RAID) and resync them, 2. Be ready for the case when most nodes are offline (and efficiently resync them when they go online)? – Vi. May 06 '11 at 11:31

Why mdadm's RAID-1 is resyncing from scratch despite of the bitmap?

3 Answers3