Lots of "Query End" states in MySQL, all connections used in a matter of minutes

Question

This morning I noticed that our MySQL server load was going sky high. Max should be 8 but it hit over 100 at one point. When I checked the process list I found loads of update queries (simple ones, incrementing a "hitcounter") that were in query end state. We couldn't kill them (well, we could, but they remained in the killed state indefinitely) and our site ground to a halt.

We had loads of problems restarting the service and had to forcibly kill some processes. When we did we were able to get MySQLd to come back up but the processes started to build up again immediately. As far as we're aware, no configuration had been changed at this point.

So, we changed innodb_flush_log_at_trx_commit from 2 to 1 (note that we need ACID compliance) in the hope that this would resolve the problem, and set the connections in PHP/PDO to be persistent. This seemed to work for an hour or so, and then the connections started to run out again.

Fortunately, I set a slave server up a couple of months ago and was able to promote it and it's taking up the slack for now, but I need to understand why this has happened and how to stop it, since the slave server is significantly underpowered compared to the master, so I need to switch back soon.

Has anyone any ideas? Could it be that something needs clearing out? I don't know what, maybe the binary logs or something? Any ideas at all? It's extremely important that we can get this server back as the master ASAP but frankly I have no idea where to look and everything I have tried so far has only resulted in a temporary fix.

Help! :)

score 30 · Accepted Answer · answered Nov 05 '12 at 15:37

I'll answer my own question here. I checked the partition sizes with a simple df command and there I could see that /var was 100% full. I found an archive that someone had left that was 10GB in size. Deleted that, started MySQL, ran a PURGE LOGS BEFORE '2012-10-01 00:00:00' query to clear out a load of space and reduced the /var/lib/mysql directory size from 346GB to 169GB. Changed back to master and everything is running great again.

From this I've learnt that our log files get VERY large, VERY quickly. So I'll be establishing a maintenance routine to not only keep the log files down, but also to alert me when we're nearing a full partition.

I hope that's some use to someone in the future who stumbles across this with the same problem. Check your drive space! :)

Thanks, this was the fix for our issue. For others finding this answer if you are running a Galera mysql cluster, check all the servers for disk space as they will all get stuck on ‘query end’ even if it is just one of the nodes full. — chris, Oct 01 '14 at 08:42

score 7 · Answer 2 · answered Jan 12 '13 at 23:22

We've been having a very similar problem, where the mysql processlist showed that almost all of our connections were stuck in the "query end" state. Our problem was also related to replication and writing the binlog.

We changed the sync_binlog variable from 1 to 0, which means that instead of flushing binlog changes to disk on each commit, it allows the operating system to decide when to fsync() to the binlog. That entirely resolved the "query end" problem for us.

According to this post from Mats Kindahl, writing to the binlog won't be as much of a problem in the 5.6 release of MySQL.

`sync_binlog=0` has the potential problem in a crash: a Slave may say "binlog at an impossible position". — Rick James, Dec 06 '18 at 17:46

score 3 · Answer 3 · answered Oct 26 '16 at 01:55

3

In my case, it was indicative of maxing out the I/O on disk. I had already reduced fsyncs to a minimum, so it wasn't that. Another symptoms is "log*.tokulog*" files start accumulating because the system can't catch up all the writes.

answered Oct 26 '16 at 01:55

Mathieu Longtin

15,922
6
30
40

Tokudb is another matter. There is no hint that the OP is using that. – Rick James Dec 06 '18 at 17:55

score 0 · Answer 4 · edited Mar 30 '21 at 03:50

I had met this problem in my production use case . I was using replace into select in the table for archival and it has 10 lakhs records init . I interrupted the query in the middle and it went to killed state .

i had set the innodb_flush_log_at_trx_commit to 1 as well as sync_binlog=1 and it completely recovered in my case . And i have triggered my archival script again without any issues .

Lots of "Query End" states in MySQL, all connections used in a matter of minutes

4 Answers4

Linked