I run a site (Magento) on a Ubuntu 14.04.3 server running on a 32xCPU VMWare VPS.
When it is under heavy load, it typically receives 20-25 requests/second. In magento there is a specific UPDATE
query to a mysql table which normally takes ~1 ms (±0.2 ms) and runs ~200-300 times per minute (3-5 queries/second). However, during these heavy loads at intervals of 1-2 hours, this specific query suddenly takes 5-35 seconds to finish, which also stalls the entire website (even requests without this query).
I have monitored ram and cpu utilization and the load typically hovers around 22-28, both before the freeze and during. The freeze seems to be nearly permanent. It can last for at least 40 minutes and restarting mysql and php-fpm does not make it go away. RAM usage never goes beyond 10 % av available RAM and swap is never used.
The only way I have to solve it is by rebooting the VPS, which makes me believe there is a underlying system misconfiguration responsible for the freeze.
An interesting note, though: A few times the issue has solved itself without a reboot. What these cases have in common is that this query "only" takes 2-7 seconds to finish. At these times, the problem goes away in 10-15 minutes times.
So, any suggestions on what causes this and how I can track down the real underlying issue?
Update 1: The system load (1 minute load for 32 CPU cores) typically peaks at 27-28, but can rise to as high as 40 under extreme load. When this freeze occurs, the load is typically 22-27 both before and during the freeze. Most if not all available CPU cores (32) have some idle time during the freeze.
Update 2: I have made these changes to my.cnf:
innodb_buffer_pool_size = 10G (Innodb data is 5.5G)
key_buffer = 16M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 8
max_connections = 1024