We have a Google App Engine Java app with 50 - 120 req/s depending on the hour of the day.
Our frontend appengine-web.xml is like that :
<instance-class>F1</instance-class>
<automatic-scaling>
<min-idle-instances>3</min-idle-instances>
<max-idle-instances>3</max-idle-instances>
<min-pending-latency>300ms</min-pending-latency>
<max-pending-latency>1.0s</max-pending-latency>
<max-concurrent-requests>100</max-concurrent-requests>
</automatic-scaling>
Usually 1 frontend instance manages to handle around 20 req/s. Start up time is around 15s.
I have a few questions :
- When I change the frontend Default version, I get thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request. So, to avoid that, I switch from one version to the other using the Traffic splitting feature by IP address, going from 1 to 100% by steps of 5%, it takes around 5 minutes to do it properly and avoid massive 500 errors. Moreover, that feature seems only available for the default frontend module.
-> Is there a better way to switch versions ?
- To avoid thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request., We must use at least 3 Resident (min-idle) instances. And as our traffic grows, even with 3, we sometimes still get massive Error 500. Am I supposed to go to 4 residents? I thought App Engine was nice because you only pay for the instances you use, so if in order to work properly we need at least half our running instances in Idle mode, that's not great, is it? It's not really cost effective as when the load is low, still having 4 idle instances is a big waste :( What's weird is that they seem to wait only 10s before responding 500 : pending_ms=10248
-> Do you have advices to avoid that ?
- Quite often, we also get thousands of Error 500 - A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104). I don't understand, there aren't any exceptions, and we get hundreds of them for a few seconds.
-> Do you have advices to avoid that ?
Thanks a lot in advance for your help ! ;)