rails puma nginx concurrent access - cannot find the bottleneck

Question

I have a simple rails app running on puma with an nginx proxy server in front of it configured in a standard way. They are running on an aws t2.micro instance.

The mysql db is running on another t2.micro instance.

If I run a jmeter load test for a simple login use case with 20 concurrent logins, I get the following result:

summary +      1 in 00:00:03 =    0.3/s Avg:  2542 Min:  2542 Max:  2542 Err:     0 (0.00%) Active: 20 Started: 20 Finished: 0
summary +     79 in 00:00:06 =   13.7/s Avg:  1734 Min:   385 Max:  3246 Err:     0 (0.00%) Active: 0 Started: 20 Finished: 20
summary =     80 in 00:00:09 =    9.2/s Avg:  1744 Min:   385 Max:  3246 Err:     0 (0.00%)

When I run the same test with 100 concurrent logins, I get the following result:

summary +    362 in 00:00:14 =   25.0/s Avg:  2081 Min:   381 Max:  9730 Err:     0 (0.00%) Active: 21 Started: 100 Finished: 79
summary +     38 in 00:00:13 =    3.0/s Avg:  4887 Min:   625 Max: 17995 Err:     0 (0.00%) Active: 0 Started: 100 Finished: 100
summary =    400 in 00:00:27 =   14.8/s Avg:  2347 Min:   381 Max: 17995 Err:     0 (0.00%)

The avg and max response time goes up by a factor of 2-5. This is not a big surprise, but I cannot find the bottleneck when I look at the server CPU and Memory load. The max CPU usage in the test timeframe is 36% and memory consumption is almost not changing at all (up 5MB).

My questions are: Where is the actual bottleneck? What is the scaling strategy? Put the puma workers on seperate EC2 instances?

I am not very experienced with setting up a such a server, so all hints are welcome.

I had a similar issue on a t2.micro. Serving everything via an Nginx page cache CPU never got above about 5%, but I hit a wall on transactions per second. Things to look at include CloudWatch to see if you've run out of CPU credits and EBS bandwidth / usage. You could also try using a large T instance for testing, or an M instance, just to see what happens - spot instances are cheap. I don't think you need more servers, I run MySQL, Nginx, and PHP for Wordpress on a t2.nano for a number of low volume websites. — Tim, Jun 15 '17 at 19:22

score 0 · Answer 1 · answered Aug 07 '17 at 14:35

Food for thought here:

You are only examining two items in the finite resource model: CPU and Memory. You have left out items related to Disk and Network.
If you are running your JMETER instance off of a virtual machine inside of AWS in order to avoid charges from load originating outside of Amazon, then you need to consider an issue called "clock jump" on virtual machines which impacts your response time data. The system clock is virtualized inside of a guest operating system. It slows when the system is in use and has to be periodically re-synchronized with the hypervisor host base clock. Just when this occurs is both unknown to you or controllable by you. How it manifests itself is as longer average and maximum timing records on your test run because this clock resync does occur when timing records are open on events. You can check against this by using a control element in your test design which runs on physical hardware, such as a control load generator. The results from the control element should help to illustrate the amount of skew in data from the non-control set.
You are running in a virtual machine, which makes it very difficult to get highly accurate numbers of what resource you are actually using due to the way the hypervisor reports to the guest operating system what is in use.
This is where a code profiler or deep diagnostic tool can come in handy. You want to explicitly look for conditions of too early allocation of a resource, too often in calls to code or too long before releasing a resource in your code. This too early, too often, too long view is where performance engineers look in code for optimization of a given business process as items which fall into these buckets are prone to both response time and scalability issues in production.

rails puma nginx concurrent access - cannot find the bottleneck

1 Answers1