0

So I've got: machine001, machine002, machine003.

machine001 has 2 resources, machine002 has 1 resource. Normally they don't go on the same host, unless machine002 goes in standby.

Recently, I saw machine002 appearing 2 times. 1 time online, 1 time offline.

Checking with sudo crm_mon -R showed they have different node ids.

I tried deleting the node id, but it refused. I tried deleting the node name, but was told there's an active node with that name.

I went in with sudo crm configure edit and it showed the configuration to be:

(111) machine001 \
    standby=off
(222) machine002 \
    standby=off
(333) machine003 \
    standby=off
(12345) machine002
other_settings... \

So, I remove the line (12345) machine002, save and commit the CIB... and machine002 completely disappears from the output of crm_mon and the output seems to constantly be trying to find it again...

Only way to get it back is to restart corosync and pacemaker on that node.

I'm at a loss for what is going on here. Can anyone point me in the right direction?

EDIT: The requested corosync.conf file:

totem {
    version: 2

    cluster_name: debian
    token: 3000

    transport: udp

    token_retransmits_before_loss_const: 10
    join: 60
    consensus: 3600
    vsftype: none
    max_messages: 20
    clear_node_high_bit: yes
    threads: 0
    rrp_mode: none

    crypto_cipher: none
    crypto_hash: none

    interface {
        ringnumber: 0
        bindnetaddr: 192.168.0.0
        mcastaddr: 239.255.64.1
        mcastport: 5405
        ttl: 1
    }
}

logging {
    fileline: off
    to_stderr: no

    to_logfile: yes
    logfile: /var/log/corosync/corosync.log

    to_syslog: no

    syslog_facility: daemon

    debug: off

    timestamp: on
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}

quorum {
    expected_votes: 3
}

nodelist {
    node {
        ring0_addr: 192.168.0.25
        name: machine001
        id: 1
    }
    node {
        ring0_addr: 192.168.0.26
        name: machine002
        id: 2
    }
    node {
        ring0_addr: 192.168.0.27
        name: machine003
        id: 3
    }
}

KoenDG
  • 75
  • 6
  • Did `machine002` happen to grab a new IP address at some point? Would you be able to add the `corosync.conf` used for the cluster? – Matt Kereczman May 01 '23 at 20:46
  • @MattKereczman No, the IP is static, as are the ones from the other machines. I've added the `corosync.conf` as requested. – KoenDG May 03 '23 at 13:42

0 Answers0