0

I have a MySQL-MMM cluster with three database servers (two masters and one slave). Recently replication was broken by someone directly inserting to the slave database servers. After I discovered this I reestablished replication from the db1 system to the db2 and db3 systems. Replication is now running and mmm_control show is showing the servers as all online:

[root@host ~]# mmm_control show
  db1(10.1.0.21) master/ONLINE. Roles: reader(10.1.0.31), writer(10.1.0.30)
  db2(10.1.0.22) master/ONLINE. Roles: reader(10.1.0.32)
  db3(10.1.0.23) slave/ONLINE. Roles: reader(10.1.0.33)

However when I check all of the status checks, I see that db1 has broken replication:

[root@host ~]# mmm_control checks all
db2  ping         [last change: 2010/11/24 03:57:48]  OK
db2  mysql        [last change: 2010/11/27 03:21:42]  OK
db2  rep_threads  [last change: 2010/11/27 03:23:19]  OK
db2  rep_backlog  [last change: 2010/11/24 03:57:48]  OK: Backlog is null
db3  ping         [last change: 2010/11/24 03:58:15]  OK
db3  mysql        [last change: 2010/11/27 03:19:21]  OK
db3  rep_threads  [last change: 2010/11/27 03:23:06]  OK
db3  rep_backlog  [last change: 2010/11/24 03:58:23]  OK: Backlog is null
db1  ping         [last change: 2010/11/24 03:57:48]  OK
db1  mysql        [last change: 2010/11/27 03:22:27]  OK
db1  rep_threads  [last change: 2010/11/27 02:14:46]  ERROR: Replication is broken
db1  rep_backlog  [last change: 2010/11/24 03:58:00]  OK: Backlog is null

What do I need to do to fix replication for db1 since it appears that the databases are in sync?

Dave Forgac
  • 3,546
  • 7
  • 37
  • 48

2 Answers2

1

This means the replication threads are offline - check your /var/log/mysql-mmm/ logs for possible clues. You might just run a simple 'mmm_control set_offline db1' followed by a 'mmm_control set_online db1' to see if toggling it restores the connection.

Checking your logs is key, though - you want to see why the repl thread is dead - is there a statement failing? This could mean your DB1 is out of sync if the error can't be skipped.

1

Log into db1's mysql and do a 'show slave status', that'll tell you what's wrong. If you see Yes for the 'I/O Thread running' and 'SQL Thread running' there, replication is fine. MMM might be confused.

On a sidenote: I think your MMM config is sub-optimal. With what I see in your mmm_control show output, if your primary master goes down, your slave will double it's traffic unless properly loadbalanced (it'll get the reader role from the master that just went down since no host can ever have more then one role more then any other host). A more advisable option is two masters and two slaves, where the masters only have writer roles, and the slaves both have 1 reader role. Just my 2 cents :)

Walter Heck
  • 102
  • 2
  • 7
  • The db1 server does have a duplicate key error from when replication initially broke. I have verified that the databases are now replicating properly from db1 and are identical using checksums I just need to restart replication on db1, right? What value do I set the log / position to? – Dave Forgac Dec 01 '10 at 16:30
  • OK, we found that the dbs were in sync so we just needed to get the current log & coordinates from db2 and start slave on db1. – Dave Forgac Dec 04 '10 at 16:27