How to reduce the number of front end instances launched by GAE?

Question

I am running two different load tests against my GAE/J application.

Loadtest #1 (LT1): Invoke /rest/cheap1 every 2 seconds and /rest/cheap2 every 60 seconds
Loadtest #2 (LT2): In addition to the URLs of LT1, each user invokes four different URLs /rest/expensive{1,2,3,4}. Each of those URLs is invoked roughly every 60 seconds.

Both load tests go

from 0 to 10.000 concurrent users within 30 minutes,
stay at 10.000 users for 30 minutes,
and then go down to 0 users within 30 minutes.

The main difference of the URLs is latency. On average, the latency of

/rest/cheap{1,2} is 70 ms
/rest/expensive{1,2,3,4} 600 ms

When running LT1, GAE is launching only few instances and puts up to 70 requests/s on each instance. After adding /rest/expensive{1,2,3,4} in LT2, GAE starts significantly more instances and only puts 5-7 requests/s on each instance resulting in a cost increase.

What can I do to use fewer instances? Is there a way to take advantage of the small latency of the most frequent operation /rest/cheap1? There are a lot of settings for the GAE scheduler, e.g. min/max pending latency, min/max idle instances, instance classes. How can I use them to my advantage in this case?
How do latency changes for /rest/expensive{1,2,3,4} impact the instance count? E.g. GAE launch half as many instances if the response times are halved?
How would the instance count get affected by setting min. pending latency to >= 600 ms?

Updates:

Yes, I set threadsafe to true in my app.

score 1 · Answer 1 · answered Jun 05 '13 at 00:02

In my tests I did all the optimization and measured with an external page checker and found that the single largest latency in Google App Engine comes from the first byte server, that your python instance is not "warm".

Some tricks when actually doing development instead of pre-written tests to reduce latency are

Use memcache everywhere - perhaps the singlest largest gain
Fetch lists instead of single entities
Do not iterate if you don't have to
Fetch keys instead of entities
Ajax can be faster than pure python

To make your tests perform better, I suggest you look at how the test was written and if it tests what you want to test.

When your python instance is started it will be much faster, it is the first byte that is slow in my tests.

It seems like you are talking about how to reduce latency in general. What I want to do is reduce the instance count for a given latency across our endpoints. Thanks! — Ingo, Jun 05 '13 at 15:32

score 0 · Answer 2 · answered May 13 '14 at 02:00

Reducing the instance count

To reduce the number of instances, you have several options :

Reduce the memory and CPU footprint

Reduce the memory and CPU usage of your code so that more executions fit on a given instance. I won't follow through on that, since from your question I understand you do not want to modify your code.

Reduce the max number of idle instances

In the App Engine console, reduce the "max number of idle instances" parameter. Those instances are kept by App Engine to handle load pikes. If you're okay with increasing the latency during load pikes, you can decrease the number of instances.

Max number of idle instances

Make requests wait for an available instance

In the App Engine console, increase the "min pending latency" parameter. This parameter decides how much time the App Engine scheduler waits for before deciding to spin up new instances to serve your requests. The higher it is, the less instances will be spun up. But of course, the latency of your requests will increase.

Min pending latency

Further discussion on GAE scalability

All those choices are trade-off. You won't be able to reduce the number of instances while keeping the latency constant.

Note that the "min pending latency" is not the complete latency of the request. It is only the time requests spend in the queue while waiting for an available instance to be served. It does not take into account the time it actually takes to serve the request (Datastore calls for example).

Check this article for a better understanding of how App Engine handles scalability. I especially recommend the best practices table for a good understanding of the performance parameters available in the App Engine console. I will reproduce it below :

GAE best practices