On an AWS Beanstalk deployment (single server) the Nginx server talking to the NodeJS/Express server on the same host occasionally complains about lost connections to upstream.
2020/03/23 10:52:43 [error] 11443#0: *70 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 172.31.46.70, server: , request: "GET /health-check HTTP/1.1", upstream: "http://172.17.0.3:33080/health-check", host: "172.31.39.242"
2020/03/23 10:52:48 [error] 11444#0: *580 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 172.31.21.226, server: , request: "POST /api/app/importNutriwebData HTTP/1.1", upstream: "http://172.17.0.3:33080/api/app/importNutriwebData", host: "******"
2020/03/23 10:52:50 [error] 11443#0: *526 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 172.31.21.226, server: , request: "GET /health-check HTTP/1.1", upstream: "http://172.17.0.3:33080/health-check", host: "172.31.39.242"
This happens without any apparent reason, including the /health-check
URL which is a very simple response.send("OK");
. It seems to happen for random URLs.
The upstream 172.17.0.3 is on the very same machine on which runs Nginx. All downstream connections come from CloudFront.
The same setup worked fine in the past 3-4 years, but these errors begin to increase since 2-3 days. I can't think of anything that may have changed, except maybe 10% more requests or so. There may be about 50 long-living EventStream connections but never more than 100 concurrent connections. I'm pretty sure the NodeJS server is fine.
I've also tried to upgrade Amazon Linux, reboot servers, rebuild the whole EBS deployment - nothing changed.
I can run an endless curl
loop to the upstream URL (http://172.17.0.3:33080/health-check
) or even the CloudFront => Nginx public URL and am unable to reproduce the problem despite trying thousands of requests (tests) for minutes.
The server has about 1.5 gigs of RAM free, CPU is at about 80% idle.
Open file handles seem low to me:
$ for pid in $(pidof nginx) ; do sudo ls /proc/$pid/fd | wc -w ; done
130
169
11
$ for pid in $(pidof node) ; do sudo ls /proc/$pid/fd | wc -w ; done
146
Could it bee that Nginx runs out of some sort of resources? Is it a timing problem? What can I do to debug this further?
Any help greatly appreciated.