How to get DRBD nodes out of Connection State StandAlone (and WFConnection)?

Question

My Debian 8.9 DRBD 8.4.3 setup somehow has got into a state where the two nodes cannot connect over the network any more. They should replicate a single resource r1, but immediately after drbdadm down r1; drbadm up r1 on both nodes their /proc/drbd describe the situation as follows:

on 1st node (Connection State is either WFConnection or StandAlone):

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
   ns:0 nr:0 dw:0 dr:912 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:20

on 2nd node:

1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
   ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:48

The two nodes can ping each other over the IP addresses cited in /etc/drbd.d/r1.res, and netstat shows that both are listening on the cited port.

How can I (further diagnose and) get out of this situation so that the two nodes can become Connected and replicate over DRBD again?

BTW, on a higher level of abstraction this problem currently manifests itself by systemctl start drbd never exiting, apparently because it gets stuck in drbdadm wait-connect all (as suggested by /lib/systemd/system/drbd.service).

score 16 · Accepted Answer · edited May 21 '20 at 08:16

16

The situation was apparently caused by a case of split-brain.

I had not noticed this because I had only inspected recent journal entries for drbd.service (sudo journalctl -u drbd), but the problem apparently was reported in other kernel logs and slightly earlier (sudo journalctl | grep Split-Brain).

With that, manually solving the split-brain (as described here or here) also resolved the troublesome situation as follows.

On split-brain victim (assuming the DRBD resource is r1):

drbdadm disconnect r1
drbdadm secondary r1
drbdadm connect --discard-my-data r1

On split-brain survivor:

drbdadm primary r1
drbdadm connect r1

edited May 21 '20 at 08:16

Tr33beard

3
3

answered Aug 24 '17 at 09:05

rookie09

623
1
6
17

2

It's best to include your steps in your answer versus linking to a site that might move later. I imagine you just needed `drbdadm disconnect r1` on both nodes, then `drbdadm connect r1 --discard-my-data` on the victim, and `drbdadm connect r1` on the survivor. – Matt Kereczman Aug 25 '17 at 14:44
@MattKereczman Done now. – rookie09 Aug 31 '17 at 06:11

score 0 · Answer 2 · edited Dec 22 '22 at 17:19

0

I use the following pattern: On Sick Node(Which is not Current DC, run pcs status)

drbdadm dump all
drbdadm disconnect resource
drbdadm secondary resource
drbdadm connect resource

On Healthy Node (Which is current DC, run pcs status )

drbdadm dump all
drbdadm disconnect resource
drbdadm primary resource
drbdadm connect resource

edited Dec 22 '22 at 17:19

sysadmin1138

133,124
18
176
300

answered Dec 21 '22 at 13:37

Kamal Kumar Sharma

1

How to get DRBD nodes out of Connection State StandAlone (and WFConnection)?

2 Answers2