1

i have the following structure:

  • 2 nodes.
  • corosync + pacemaker + DRBD + OCFS2.

==============

After test Network fail the status from DRBD become:

****Primary/Unknown****

dmesg show the following:

Split-Brain detected but unresolved, dropping connection.

i can get the status from DRBD to Primary/primary back manually by commands but i want that automatically.

My configuration is: /etc/corosync/r0

resource r0 {
    protocol C;
    startup {
       become-primary-on both;
    }
    disk {
            on-io-error     detach;
            fencing         resource-only;
            resync-rate     1000M;
    }
    handlers {
            split-brain             "/usr/lib/drbd/notify-split-brain.sh root";
            fence-peer              "/usr/lib/drbd/crm-fence-peer.sh";
            after-resync-target     "/usr/lib/drbd/crm-unfence-peer.sh";
    }
    net {
            allow-two-primaries yes;
            cram-hmac-alg sha1;
            shared-secret "DRBD Super Secret Password";
            timeout 180;
            ping-int 3;
            ping-timeout 9;
            allow-two-primaries;
            after-sb-0pri discard-zero-changes;
            after-sb-1pri discard-secondary;
            after-sb-2pri disconnect;
    }
    on node01 {
            device /dev/drbd0;
            address 192.168.64.128:7788;
            meta-disk internal;
            disk /dev/mapper/SSDVolume-VMData;
            }
    on node02 {
            device /dev/drbd0;
            address 192.168.64.129:7788;
            meta-disk internal;
            disk /dev/mapper/SSDVolume-VMData;
            }

}


My crm configure show is:

primitive drbd_r0 ocf:linbit:drbd \
    params drbd_resource=r0 \
    op monitor interval=20 role=Master timeout=30 \
    op monitor interval=30 role=Slave timeout=40
primitive filesys Filesystem \
    params device="/dev/drbd0" directory="/vmstore" fstype=ocfs2 
options="rw,noatime" \
    op start interval=0 timeout=60 \
    op stop interval=0 timeout=60
primitive virtual_ip IPaddr2 \
    params ip=192.168.38.10 cidr_netmask=32 \
    op monitor interval=10s \
    meta migration-threshold=10
ms ms_drbd_r0 drbd_r0 \
    meta master-max=2 master-node-max=1 notify=true
clone filesys_clone filesys \
    meta interleave=true
colocation col_filesys_clone-on-drbd_master inf: filesys_clone 
ms_drbd_r0:Master
order filesys_clone-after-drbd_master inf: ms_drbd_r0:promote 
filesys_clone:start
property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.14-70404b0 \
    cluster-infrastructure=corosync \
    cluster-name=debian \
    stonith-enabled=false \
    no-quorum-policy=ignore
rsc_defaults rsc-options: \
    resource-stickiness=100

My question is:

What should i do to get the status from DRBD Primary/primary again automatically?

Thanks in Advanced

1 Answers1

2

When running DRBD in dual-primary any interruption to the replication network will result in a split-brain. You'll need to resolve this manually by following the steps in the user-guide here: https://docs.linbit.com/doc/users-guide-84/s-resolve-split-brain/

More importantly, you're running DRBD in dual-primary without STONITH and fencing. This is dangerous and will eventually fail you. You will either lose data or corrupt your data. Using DRBD like this is not advised, or supported.

Good luck!

Dok
  • 1,158
  • 1
  • 7
  • 13
  • Thanks you that was very important advice. Today i have spent the whole day searching on the internet for how to fence DRBD resources i have tried the following Methods: - stonith:external/ssh ---- stonith:fence_ipmilan --- stonith:external/ipmi ----- stonith:external/ibmrsa-telnet. But nothing has worked !!! can u write to me the steps or the crm configure code for fencing DRBD please? – Saleh Alashkar Oct 12 '17 at 14:16
  • external/ssh shouldn't be used beyond testing purposes. You can't trust ssh to fence a stuck node. What if the OS is froze and is not responded to ssh request? external/ipmi and fence_ipmilan are valid devices, but do the nodes support ipmi? Can you query the ipmi device via `ipmitool` ? Do the nodes have IBM RSA boards for the ibmrsa-telnet agent? Stonith agents are very hardware (or hypervisor) specific. – Dok Oct 12 '17 at 18:18
  • that means i should connect with ipmitool command through ilo port to server (using ilo ip address) and send the command of shutdowns or reboot or whatever.and when i use stonith:external/ipmi command corosnyc do this command automatically when the network connection fails. now i have clear image about this thank you so much that was really helpful i will continue building the structure and post the result later. – Saleh Alashkar Oct 13 '17 at 07:38