0

I have simple master-slave replication setup. Everything worked with no problems for more than a year.

Few days ago replication lag started to rise and it's still rising with no obvious reason.

Seconds_Behind_Master is now more than 4 days and 6 hours.

Both SQL and IO slave threads are working, there is no problem with communication.

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: s1.my.pl
                  Master_User: root
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.001674
          Read_Master_Log_Pos: 10948252
               Relay_Log_File: mysqld-relay-bin.000509
                Relay_Log_Pos: 63370742
        Relay_Master_Log_File: mysql-bin.001638
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 63370596
              Relay_Log_Space: 3523067396
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 371329
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

Show full processlist produces strange output in one row "exit mysqld_lock_tables()", I'm not sure what it means

 SHOW FULL PROCESSLIST;
+-----+-------------+-----------+------+---------+--------+----------------------------------+-----------------------+-----------+---------------+-----------+
| Id  | User        | Host      | db   | Command | Time   | State                            | Info                  | Rows_sent | Rows_examined | Rows_read |
+-----+-------------+-----------+------+---------+--------+----------------------------------+-----------------------+-----------+---------------+-----------+
| 808 | system user |           | NULL | Connect |    892 | Waiting for master to send event | NULL                  |         0 |             0 |         1 |
| 809 | system user |           | NULL | Connect | 371477 | exit mysqld_lock_tables()        | NULL                  |         0 |             0 |         1 |
| 810 | root        | localhost | NULL | Query   |      0 | NULL                             | SHOW FULL PROCESSLIST |         0 |             0 |       401 |
+-----+-------------+-----------+------+---------+--------+----------------------------------+-----------------------+-----------+---------------+-----------+

SHOW_OPEN_TABLES command shows that no table is locked

Any hints what can I do to fix this or how to get more detailed diagnosis?

Matthias
  • 187
  • 3
  • 10
  • Any ideas how to show exact query that causes lag? – Matthias Jan 15 '13 at 18:39
  • This might be due to UPDATE/DELETE queries which are causing delay due to not using index or taking too much time to update. You need to enable slow query log (http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html). BTW, do you see any long running query in processlist? – Wasif Jan 16 '13 at 10:01

1 Answers1

1

One of the problems with MySQL replication is the serialization of I/O in the slave. While the master performs several I/O tasks at once, they get written to a single relay log, which is then executed by the slave, hence serializing the I/O. Therefore your slave should always be capable of more I/O throughput than your master.

To see if you've actually got a I/O problem, install ioprompt (homepage).

In your case, there could be another problem, you might have run into a bug. Have you tried

stop slave;
start slave;

to see if that fixes the problem? Next thing try stopping and starting your MySQL slave process.

Perleone
  • 135
  • 8
  • Yes, I tried restarting both replication slave and mysqld itself with no success. iotop shows: `Total DISK READ: 13.95 M/s | Total DISK WRITE: 2022.81 B/s` Disk array is hardware SAS RAID, it can handle more than 10x of such load – Matthias Jan 15 '13 at 18:36
  • @Matthias Are there any other processes showing up in iotop who seem to be waiting for i/o? You have to watch iotop for a while to get a feeling for it. Also use the "o" key to only show i/o-active processes. – Perleone Jan 15 '13 at 18:41
  • It's spare/backup server. There is nothing more running there, even no SQL SELECT queries. MySQL replication slave is only process making disk IO – Matthias Jan 15 '13 at 20:32
  • I didn't see ioprompt in the RHEL repo, but I was able to get similar numbers from atop. It was nice how it highlighted the disk utilization in red when it was a problem. – Greg Nov 04 '13 at 14:35