I am running two different load tests against my GAE/J application.
Loadtest #1 (LT1): Invoke /rest/cheap1 every 2 seconds and /rest/cheap2 every 60 seconds
Loadtest #2 (LT2): In addition to the URLs of LT1, each user invokes four different URLs /rest/expensive{1,2,3,4}. Each of those URLs is invoked roughly every 60 seconds.
Both load tests go
- from 0 to 10.000 concurrent users within 30 minutes,
- stay at 10.000 users for 30 minutes,
- and then go down to 0 users within 30 minutes.
The main difference of the URLs is latency. On average, the latency of
- /rest/cheap{1,2} is 70 ms
- /rest/expensive{1,2,3,4} 600 ms
When running LT1, GAE is launching only few instances and puts up to 70 requests/s on each instance. After adding /rest/expensive{1,2,3,4} in LT2, GAE starts significantly more instances and only puts 5-7 requests/s on each instance resulting in a cost increase.
What can I do to use fewer instances? Is there a way to take advantage of the small latency of the most frequent operation /rest/cheap1? There are a lot of settings for the GAE scheduler, e.g. min/max pending latency, min/max idle instances, instance classes. How can I use them to my advantage in this case?
How do latency changes for /rest/expensive{1,2,3,4} impact the instance count? E.g. GAE launch half as many instances if the response times are halved?
How would the instance count get affected by setting min. pending latency to >= 600 ms?
Updates:
- Yes, I set threadsafe to true in my app.