MySQL replication across availability zones in AWS resulting in lag

Question

I have 3 EC2 instances, each running MySQL. 2 are slaves replicating from the master.

The master is located in the availability zone eu-west-1b. One of the slaves is in the same zone, and the other is in eu-west-1a.

The slave located in eu-west-1a is experiencing considerable I/O lag (often 1 hour+), while the slave in eu-west-1b rarely falls behind master at all.

The master database is a busy one that processes many writes per second, but it seems strange to me that it is experiencing such a lag. If I scp a file from the master, both slaves receive it at roughly the same rate.

Is a lag like this normal?

the lag is normal for asynchronous replication. But are you sure the IO thread is behind? Normally the IO keeps up after the master and one is lagging is the SQL thread — akuzminsky, Aug 05 '14 at 14:02
If your traffic is very light, it may be not be lagging... the IO thread's connection could be timing out in the EC2 network layer and reconnecting... http://dba.stackexchange.com/a/71846/11651 ...anything unusual in the error log on this slave or the master? — Michael - sqlbot, Aug 05 '14 at 19:02
@akuzminsky - a combination of Seconds_Behind_Master and observing the last inserted timestamps. During busy periods it goes up to about an hour behind. I know it is the IO thread because the Read_Master_Log_Pos is way behind. — Outspaced, Aug 06 '14 at 09:30
@Michael-sqlbot - the traffic is definitely not light, there are thousands of writes per minute. The lag goes up during busy periods - overnight it will drop back to 0 seconds behind. There are no IO errors to speak of. — Outspaced, Aug 06 '14 at 09:33
If you stop the SQL thread, does the IO thread's `Read_Master_Log_Pos` stay current? Sounds like a potential disk I/O blocking issue at the slave. It isn't really normal to see significant lag on the IO thread unless local disk I/O is choking it or you have limited bandwidth from the master (which you shouldn't, but...) — Michael - sqlbot, Aug 06 '14 at 11:46
`SET GLOBAL SLAVE_COMPRESSED_PROTOCOL = 1;` on the slave, followed by a stop/start on the IO thread might be worth trying, if you haven't already. This causes the slave to negotiate a compressed connection with the master, and will reduce the bandwidth required on the link, without any significant resource cost. — Michael - sqlbot, Aug 06 '14 at 11:51
@Michael-sqlbot I have the compressed connection active already, but still it lags. — Outspaced, Aug 08 '14 at 09:14
@Michael-sqlbot I have tried your suggestion for stopping the SQL thread - the IO thread keeps on lagging behind though — Outspaced, Aug 08 '14 at 09:15

MySQL replication across availability zones in AWS resulting in lag

0 Answers0