3

One of our replication master production servers is showing some really odd behavior for which I can't seem to find a solution.

Some threads on this server get stuck in state 'end'. This happens purely random, but when this occurs the thread is always updating or inserting rows in a table. The tables on which the query is running differs but is always on a MyISAM tables and in a range of three different tables.

When a thread goes in end-state all other threads get stucked with status locked. And when I say all threads I mean all, even threads that are not querying the same database or table.

The web-servers keep queuing queries to the database server without getting a response. This ultimately causes the web-servers to run out of sockets. At that moment all request to the domains get denied. The database servers shows no I/O or processor activity during the time the thread is in 'end' state. When this problem occurs I have to kill the thread manually. Even that does not do anything other that it's command-status changes to 'killed' . Most of the threads disappear after about 100 seconds.

The tables on which the threads are running queries when they go in end-state vary in size but are around 20 to 100 MB. At the moment that this problems occurs these tables are frequently updated but not in extreme ways. I think the updates range to 3 to 10 per second.

Some specifications about the server. The OS is CentOS 5.4 with MySQL 5.0.77-log. The processor is a AMD Opteron 2378, the harddrives are a RAID 1+0 array of Corsair X32 32GB SSD's.

I am thinking that the SSD's might be a part of the cause of the problem but I can't find any data to confirm that. The drives have performed quite stable for a while.

I have read the documentation on the MySQL reference guide about General Thread States which says that during the end-state the binary log and query cache gets updated. Maybe this has something to do with the cause of the problem? I would not no which configuration directives could give a working solution.

I have not tried to disable query cache, and I am not able to disable replication as this is a running production server. The fact that this is a running production server causes me to be careful when changing parameters such as the query cache settings unless I know for sure that this is going to solve the problem.

I have not been able to reproduce the problem with some of my test-scripts. When reading, writing and updating the tables that cause the problems heavy the problem does not occur. The occurrence of this problem is purely random.

1 Answers1

2

After further investigation it seemed that it was the query cache that caused the problem. Due to a large query cache MySQL seems to get confused when cleaning the cache. Reducing the query cache from a few GB's to about 512 MB solved the problem. For detailed information see: http://bugs.mysql.com/bug.php?id=39091