0

I was testing a few failover cases & initially this was my setup

maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────────┐
│ Server  │ Address        │ Port │ Connections │ State           │ GTID       │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server1 │ XXX.XXX.XX.XXX │ 3306 │ 0           │ Slave, Running  │ 0-1-853336 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server2 │ XXX.XXX.XX.XXX │ 3306 │ 0           │ Master, Running │ 0-1-853336 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server3 │ XXX.XXX.XX.XXX │ 3306 │ 0           │ Slave, Running  │ 0-1-853336 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server4 │ XXX.XXX.XX.XXX │ 3307 │ 0           │ Slave, Running  │ 0-1-853336 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────────┘

I shut down Master (server2) & Slave (server1) & started them again manually, so this became the setup -

maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────────┐
│ Server  │ Address        │ Port │ Connections │ State           │ GTID       │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server1 │ XXX.XXX.XX.XXX │ 3306 │ 0           │ Running         │ 0-1-853336 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server2 │ XXX.XXX.XX.XXX │ 3306 │ 0           │ Running         │ 0-1-853336 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server3 │ XXX.XXX.XX.XXX │ 3306 │ 0           │ Master, Running │ 0-1-853336 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ server4 │ XXX.XXX.XX.XXX │ 3307 │ 0           │ Slave, Running  │ 0-1-853336 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────────┘

Now, since auto_failover=true & auto_rejoin=true, server1 & server2 should rejoin as slaves but they continue to show status as running. I even tried manually rejoining them the command maxctrl call command mariadbmon rejoin DatabaseMonitor server1 but it's showing this error -

Error: Server at 127.0.0.1:8989 responded with status code 403 to `POST maxscale/modules/mariadbmon/rejoin?DatabaseMonitor&server1`:{
    "errors": [
        {
            "detail": "'server1' cannot replicate from master server 'server3': gtid_current_pos of 'server1' (0-1-853336) is incompatible with gtid_binlog_pos of 'server3' (0-200-3)."
        }
    ]

I'm sure I'm missing out something on GTID replication but I can't understand why. Can anyone tell what's happening or how to fix this? Thanks.

Nandni
  • 1

1 Answers1

0

Make sure you have log_slave_updates enabled on all your database nodes: this is required for both failover and switchover to work as the binlog events must be available on all nodes.

This might also be related to this bug report which describes a similar situation: if no new transactions occur between the failover from one node to another, the rejoining nodes cannot join as the gtid_binlog_pos of new the master server is not compatible with the gtid_current_pos of the old master server, exactly as the error message describes.

If you run a command that creates a binlog event (e.g. FLUSH LOGS) on the new master server, the rejoin should work after that.

markusjm
  • 2,358
  • 1
  • 11
  • 23