0

I have a master/slave MySQL replication working for day on production level, but every day, the replication stops working. But every thing says it is ok:

show slave status; show Slave_IO_Running and Slave_SQL_Running is yes.

If I execute:

stop slave;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
start slave;

the replication works again and the "seconds behind master" starts to fall to zero. But i need to find WHAT is breaking the replication.

Digging a little deeper, I found in mysql.log this:

2017-06-17 00:19:48 3084 [Note] 'SQL_SLAVE_SKIP_COUNTER=1' executed at relay_log_file='./mysqld-relay-bin.000055', relay_log_pos='632837719', master_log_name='mysql-bin.000046', master_log_pos='632837556' and new position at relay_log_file='./mysqld-relay-bin.000055', relay_log_pos='638878870', master_log_name='mysql-bin.000046', master_log_pos='638878707'

so I executed:

[root@ip-172-31-19-9 mysql]# mysqlbinlog --no-defaults -v mysql-bin.000046 --start-position=632837554  --stop-position=632837558

and the result was:

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!40019 SET
@@session.max_insert_delayed_threads=0*/; /*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER
/*!*/;
# at 4
#170616 17:35:09 server id 1  end_log_pos 120 CRC32 0xcc698d98  Start: binlog v 4, server v 5.6.36-log created 170616 17:35:09
# Warning: this binlog is either in use or was not closed properly. BINLOG '
/UBEWQ8BAAAAdAAAAHgAAAABAAQANS42LjM2LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAZiN
acw= '/*!*/; ERROR: Error in Log_event::read_log_event(): 'Found
invalid event in binary log', data_len: 4653056, event_type: 68
DELIMITER ;
# End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET
@@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@ip-172-31-19-9 mysql]#

What is this "event_type: 68"? any clues?

Bentaye
  • 9,403
  • 5
  • 32
  • 45
costamatrix
  • 670
  • 8
  • 17
  • The error shows a master log position of 632837556, but your mysqlbinlog invocation jumps to 632837554. You can't jump to an arbitrary offset in a binary log unless that position is precisely the beginning of an event -- the binlog file format does not have any framing structures to allow correction of offset errors during read, so the resulting error becomes nonsensical due to everything being misinterpreted because of the offset error. Try again with the correct offset and see what you find, please. AFAIK there is no event 68. – Michael - sqlbot Jun 17 '17 at 18:59
  • Also, capture the output of `SHOW SLAVE STATUS;` **before** you `SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;`. This, of course, should *never* be needed -- if you do this even once without understanding the cause, you need to rebuild your replica because replication generally stops *because* your replica is out of sync or was not set up correctly from the start, and the divergence only gets worse over time. – Michael - sqlbot Jun 17 '17 at 19:05

0 Answers0