Why the latency increases on highload?

Question

We have a system built on AWS. We use Beanstalk, we have autoscaling, our database (mysql) is hosted on RDS. We use apache and php. We wanted to test our system on highload. So, we chose large instances for backend (4 CPUs, 15Gb of RAM - 20 instances) and big instance for RDS (8 CPUs, 30 Gb of RAM). And we ran the marketing campaign - many many users came to our website. We were checking latency all the time. And then suddenly it increased to 7 seconds. I would understand if that happened because CPU load was 100% or no free memory. But no, CPU utilization on apache servers was ~50%, on RDS server ~20%. Requests to database - ~20 per second. Enough memory. So I don't know why the latenncy increased. Steps I made for investigations:

I saw error "Too many connections". After that I increased max_connections option in RDS
I increased the number of users apache can serve. Using this article: http://www.genericarticles.com/mediawiki/index.php?title=How_to_optimize_apache_web_server_for_maximum_concurrent_connections_or_increase_max_clients_in_apache

But the problem still exists. I don't know how to fix that. Why the latency value increases when there's enough resources to handle everything? Please, help. Thank you.

Reminder you are SHARING an underlying infrastructure with a number of different users competing for the same resources. — mdpc, Apr 21 '15 at 19:49

score 1 · Answer 1 · answered Apr 22 '15 at 00:42

To be honest the cause could be dozens of different things, you really need to systematically profile each component and narrow down where latencies are being introduced, rather than trying to guess where the problem is.

Having said that, here's two things that come to mind:

ELB does not scale instantly

If you're using ELB (which I'm assuming you are) you need to either scale your traffic up slowly or request AWS support to pre-warm your load balancer.

We recommend that you increase the load at a rate of no more than 50 percent every five minutes. Both step patterns and linear patterns for load generation should work well with Elastic Load Balancing. If you are going to use a random load generator, then it is important that you set the ceiling for the spikes so that they do not go above the load that Elastic Load Balancing will handle until it scales (see Pre-Warming the ELB).

http://aws.amazon.com/articles/1636185810492479

Disk IO

Disk IO could be an issue on either your application server or database. People often assume Disk IO is an infinite resource that never contributes to latency because it seems that way on their local unloaded machines. If Disk IO is an issue then look at provisioned IOPS.

Seams like you are right and the problem is because of ELB. I am contacting their support. — Volodymyr, Apr 22 '15 at 14:24

Why the latency increases on highload?

1 Answers1