DRDB and NFS: NFS downtime during failover restoration

Question

I'm pretty new to DRBD and NFS, and am in the process of testing a DRDB server with heartbeat to use for our company's NFS share.

The entire setup is running fine, with the NFS state directory running on the DRDB share as together with the actual share.

The problem I'm having comes from the failover scenario I'm testing. In the failover scenario, I create a problem where I disable the the network connection (ifconfig eth0 down) on node1. The failover works awesome and does its job in about 5 seconds, but when I bring it back up (ifconfig eth0 up, and service heartbeat start if it's stopped) it takes upward of 3-5 minutes to get it back up during which the NFS share is unavailable.

In a web environment, this 3-5 minutes is pretty significant for downtime. Is this normal? What am I doing wrong?

I pasted our drbd.conf file below.

resource r0 {
 protocol C;
 startup {
    degr-wfc-timeout 60;    # 1 minute.
  }
  disk {
    on-io-error   detach;
  }
  net {
  }
  syncer {
    rate 10M;
    al-extents 257;
  }
 on tsa-dev-nfstest1 {                # ** EDIT ** the hostname of server 1 (un
   device     /dev/drbd0;        #
   disk       /dev/sdc1;         # ** EDIT ** data partition on server 1
   address    10.61.2.176:7788; # ** EDIT ** IP address on server 1
   meta-disk  /dev/sdb1[0];      # ** EDIT ** 128MB partition for DRBD on serve
  }
 on tsa-dev-nfstest2 {                # ** EDIT ** the hostname of server 2 (un
   device    /dev/drbd0;         #
   disk      /dev/sdc1;          # ** EDIT ** data partition on server 2
   address   10.61.2.177:7788;  # ** EDIT ** IP address on server 2
   meta-disk /dev/sdb1[0];       # ** EDIT ** 128MB partition for DRBD on serve
  }
}

score 0 · Answer 1 · answered Jul 14 '11 at 17:09

During the development of a drbd backed highly available NFS server for my company I found that there was a few minutes (up to about 10) downtime for clients when the clustered IP got back to the original node after a test. In this situation new connection were accepted and served immediately but already connected client experienced this minutes downtime.

After examining the network traffic with tcpdump I found that it was a problem of TCP connections going out of sync of sequence numbers and needed reset.

I'd suggest you use Pacemaker instead of just Heartbeat to manage the cluster. If you do, in real life situations, Pacemaker is able to issue STONITH (Shoot The Other Node In The Head) requests that would prevent this situation to happen. Basically it would reboot the other node and this would definitely solve this TCP problem.

Also Pacemaker is much better than Heartbeat at monitoring. Have a look at this sites:

Pacemaker

NFS over DRBD from Linbit

score 0 · Accepted Answer · answered Jul 15 '11 at 21:01

Does your heartbeat resource group include a logical IP for you NFS-service?

It should be your last resource that comes "up" and the first that goes "down". Your clients should use this IP for accessing the NFS-service.

If you have an IP defined you might try to use the other resource type for IPAddr (IPAddr2 as far as I remember). That one behaves a little bit different on the IP-stack.

Basically both types should to an arp-broadcast after the IP comes up - to make sure the connected routers and switches relearn their mac-tables so they know where to forward the packets to after the failover happened.

On some NFS implementations you should also explicitly arp your already connected clients. For this you have to mirror the connected client data to your stand-by-node as well.

DRDB and NFS: NFS downtime during failover restoration

2 Answers2