4

Here's the scenario:

I'm running my Java/Spring app on Amazon EC2 Linux instance in load balancing mode with 3 servers up initially, that can scale up or down as required.

Scale up criteria: When CPU Utilization goes above 30% for more than 10 mins, add 2 more servers.

Scale down criteria: When CPU Utilization decreases to below 15% for more than 10 minutes, remove one server.

Loading (with blazemeter.com): Increase the no. of users steadily from 0 to 50 in around 15 minutes, and remain constant from there onwards.

Response:

  • In the first 15 minutes, the load increased to 50 hits/second, and remained steady for another 5 minutes. CPU Utilization remains at around 30%. Response time is below 20ms in this phase.
  • While the load was at 50 hits/second, at around 20 mins from start, CPU utilization spiked to around 33% for more than 10 mins thereby triggering step up. Response time increases dramatically to fluctuate between 5000ms to 15000ms.
  • With 2 additional servers now (server count now 5), CPU utilization goes back to 20%, but response time shows no sign of receding. It still remains between 5000ms to 15000ms for the rest of the testing period till the load was removed.

My question is, why do you think the response time didn't come down to normal (around 20ms) when the CPU utilization was back to normal (around 20% utilization)?

CPU Utilization chart
CPU Utilization chart

Response time chart Response time chart

Thanks for your time :)

James
  • 1,237
  • 2
  • 20
  • 32
  • 3
    Are you using any external resources like S3 or a database in your application? Sounds like there may be a bottleneck outside of CPU usage. – Joachim Isaksson Jan 27 '12 at 10:50
  • Yes, the app interacts with MySQL database, and I can see the below exception repeated so many times in the log: – James Jan 27 '12 at 11:25
  • com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 67,261,014 milliseconds ago. The last packet sent successfully to the server was 67,261,014 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem. ... Caused by: java.net.SocketException: Broken pipe – James Jan 27 '12 at 11:25
  • 1
    Not sure whether the exception is related to the response time issue, but it would seem like a problem with your JDBC connection pool. Either connections are not released back to the pool as they should, or the pool keeps inactive connections around for too long (18 hours sounds like a long time to keep an inactive connection around) – Joachim Isaksson Jan 27 '12 at 11:36
  • What load average / free says? `$ uptime` and `$ free -m` – Roman Newaza Mar 03 '12 at 06:30
  • What are the tools your are using to monitor the instance? – Jeevan Dongre Nov 02 '12 at 16:03
  • what is the value for your mysql wait_timeout ? You can run `show variables;` from the mysql command line. – Rico Dec 16 '13 at 18:07
  • Also are you running behind an ELB ? If yes have you looked at the ELB metrics? – Rico Dec 16 '13 at 18:11
  • Looks to be issue of connection pooling in database. – KNOWARTH May 30 '15 at 17:57

0 Answers0