0

We have an Node.js App Engine service that serves an API.

It's very rarely (1 in 500 requests) returning 502s to clients, and getting the error in our nginx log from Google Cloud Logging: upstream prematurely closed connection while reading response header from upstream.

These requests don't seem to reach our instance, because when trying to debug this we set up logging to immediately log any request as soon as it is receieved.

The requests that are failing generally seem pretty random.

The problem appears very similar to the problem described in https://groups.google.com/g/google-appengine/c/6gvlur9tXW0/m/bXzY_qAYBAAJ, but this thread was closed before resolution.

domdomegg
  • 1,498
  • 11
  • 20

1 Answers1

3

The solution

Set your server.keepAliveTimeout to 700 seconds (or at least 650 seconds, plus a good buffer for network latency). For example:

const server = http.createServer({ keepAliveTimeout: 700_000 }, app)
server.listen(port, () => console.log('Server listening'));

The cause

If you are getting the error 'upstream prematurely closed connection', it means that the Google Front End (GFE) received the request and forwarded it to nginx, nginx received the request and forwarded it to the application. Ngnix then waits for a response, but instead of getting a response, the connection is closed by the application so it can no longer be used to receive a response. Since it has to send a response back to the client, it sends a 502 in response to the GFE.

This is usually the result of having a connection keepalive timeout of the application smaller than the keepalive timeout on nginx, which causes a race condition between which service terminates the connection. The nginx keepalive_timeout Google has configured on GAE is 650 seconds, to avoid race conditions with the Google Cloud Load Balancers (GCLB) which have a timeout of 600 seconds.

The race condition occurs as if the timeouts get shorter deeper into your infrastructure, it's possible that the outer wrapper tries to reuse the connection just as the inner service closes the connection. For example, if your app has a timeout of 5 seconds (the default set in Node.js):

  • t0: nginx receieves a request, opens a connection and forwards it on
  • t5: nginx recieves another request, reuses the connection (as 5 < 650, its timeout), and forwards it along the connection. At the same time your app reaches its connection timeout and tells nginx it is closing the connection.

This is explained in more detail (in relation to just GCLB -> your app, rather than GCLB -> nginx -> your app): https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340

domdomegg
  • 1,498
  • 11
  • 20