1

I run a site (Magento) on a Ubuntu 14.04.3 server running on a 32xCPU VMWare VPS.

When it is under heavy load, it typically receives 20-25 requests/second. In magento there is a specific UPDATE query to a mysql table which normally takes ~1 ms (±0.2 ms) and runs ~200-300 times per minute (3-5 queries/second). However, during these heavy loads at intervals of 1-2 hours, this specific query suddenly takes 5-35 seconds to finish, which also stalls the entire website (even requests without this query).

I have monitored ram and cpu utilization and the load typically hovers around 22-28, both before the freeze and during. The freeze seems to be nearly permanent. It can last for at least 40 minutes and restarting mysql and php-fpm does not make it go away. RAM usage never goes beyond 10 % av available RAM and swap is never used.

The only way I have to solve it is by rebooting the VPS, which makes me believe there is a underlying system misconfiguration responsible for the freeze.

An interesting note, though: A few times the issue has solved itself without a reboot. What these cases have in common is that this query "only" takes 2-7 seconds to finish. At these times, the problem goes away in 10-15 minutes times.

So, any suggestions on what causes this and how I can track down the real underlying issue?

Update 1: The system load (1 minute load for 32 CPU cores) typically peaks at 27-28, but can rise to as high as 40 under extreme load. When this freeze occurs, the load is typically 22-27 both before and during the freeze. Most if not all available CPU cores (32) have some idle time during the freeze.

Update 2: I have made these changes to my.cnf:

innodb_buffer_pool_size = 10G (Innodb data is 5.5G)
key_buffer          = 16M
max_allowed_packet  = 16M
thread_stack        = 192K
thread_cache_size   = 8
max_connections     = 1024
Xyz
  • 111
  • 1
  • 10
  • How much memory is assigned to the VM? How much disk space? Is there a swap partition? Have you updated any database parameters to optimise it for the current VM configuration? – AndrewNimmo Nov 09 '15 at 16:01
  • I were running with 256 GB RAM and a 950 Mb swap partition which was never utilized. I have updated innodb_buffer_pool_size to 10G (mysqltuner says 5.5 Gb is used). Also these settings, key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 max_connections = 1024 – Xyz Nov 09 '15 at 16:06
  • Is that high frequency update query for logging something to the database? Don't do that. I don't know magento, but there's probably better options, and just about any alternative offered is likely to be better. – mc0e Nov 15 '15 at 07:26

3 Answers3

3

Have you monitored Disk I/O? Is there an increase in I/O wait times or queued transactions? It's possible that requests are queueing up at the storage level due to an I/O limitation put on by your host. Also, have you checked if you're hitting your max allowable mysql clients? If these queries are suddenly taking a lot longer to complete, it's also possible that it's not leaving enough available connections for normal site traffic because the other connections aren't closing fast enough.

Safado
  • 4,786
  • 7
  • 37
  • 54
  • Thanks for a great answer. [1] No, I have not monitored Disk I/O (yet). [2] I don't know how to monitor queued transactions. [3] mysqltuner says the highest usage of available connections is 15 % (157 used out of 1024 available). Unfortunately the site is currently not under this load so the freeze doesn't happen anymore. But I will setup I/O monitoring and try to provoke the freeze via a load test and watch what happens with I/O and queued transactions. – Xyz Nov 09 '15 at 16:21
  • I think the allowable connections problem isn't as likely because that wouldn't cause your server to freeze, it would just make your site slow/non responsive. I figured I'd just throw it out there. I would still bet on it being an IO problem, though. When your server starts to choke, you can use the `ps` utility to look for any processes that have the state of "D". This is a state of uninterrubtable sleep, which happens when a process is waiting on a resource, like DISK. `iotop` is a good utility to use to monitor currently running resource's IO usage. – Safado Nov 09 '15 at 22:29
  • +1 for monitoring I/O. By the sound of it, this is your bottleneck. – Oldskool Nov 13 '15 at 10:01
1

If you're using a VPS, you presumably can't see what is going on in other hosts on the same physical hardware.

It might be that IO getting heavily loaded, possibly by you, is leading to something in an entirely separate VPS backing up, which then takes time to resolve. That might be why restarting php and mysql on your system is not enough to get things back on track. It's interesting though that rebooting your VPS sounds like it does resolve the issue? Any chance that's actually just a function of some time going by?

If you shut down php and mysql, you'd think there wouldn't be much consuming resources in your system (I'm making a lot of assumptions there, but you should know more). Check that though.

Look to see what activity is still going on. Atop is a nice tool in that it includes looking at per process IO activity, given sufficient permissions to do so. iostat is useful for looking at total disk activity for each device.

If you don't have much disk activity in your VPS, but performance is poor, then it's likely to be in another VPS, or maybe even the host. You'd need to talk to your hosting provider about that, but be aware that if it's you triggering the problem, then you'd expect them to be concerned by that.

mc0e
  • 5,866
  • 18
  • 31
0

Could be system limitation if the VPS is under heavy load, could you provide information about the VPS load when this occurs, as well as system logs ?

svt
  • 81
  • 1
  • 2
  • Certainly. The system load (1 minute load) typically peaks at 27-28, but can rise to as high as 40 under extreme load. When this freeze occurs, the load is typically 22-27 both before and during the freeze. All available CPU cores (32) have idle time. What system logs are of interest? – Xyz Nov 03 '15 at 15:56