Here's the scenario:
I'm running my Java/Spring app on Amazon EC2 Linux instance in load balancing mode with 3 servers up initially, that can scale up or down as required.
Scale up criteria: When CPU Utilization goes above 30% for more than 10 mins, add 2 more servers.
Scale down criteria: When CPU Utilization decreases to below 15% for more than 10 minutes, remove one server.
Loading (with blazemeter.com): Increase the no. of users steadily from 0 to 50 in around 15 minutes, and remain constant from there onwards.
Response:
- In the first 15 minutes, the load increased to 50 hits/second, and remained steady for another 5 minutes. CPU Utilization remains at around 30%. Response time is below 20ms in this phase.
- While the load was at 50 hits/second, at around 20 mins from start, CPU utilization spiked to around 33% for more than 10 mins thereby triggering step up. Response time increases dramatically to fluctuate between 5000ms to 15000ms.
- With 2 additional servers now (server count now 5), CPU utilization goes back to 20%, but response time shows no sign of receding. It still remains between 5000ms to 15000ms for the rest of the testing period till the load was removed.
My question is, why do you think the response time didn't come down to normal (around 20ms) when the CPU utilization was back to normal (around 20% utilization)?
CPU Utilization chart
Response time chart
Thanks for your time :)