Occasionally Bad Gateway (502) from nginx and jetty on aws after upgrading dropwizard

Question

I'm running a jetty REST server on AWS - Elastic Beanstalk, with nginx.

The application is running using the Dropwizard framework.

Recently, I've upgraded Dropwizard from version 1.2.2 to 1.3.5.

Then, some of my integration tests started to fail occasionally with Bad Gateway response.

These are the sort of errors I see in nginx error.log:

2018/08/14 05:03:07 [error] 12897#0: *11330 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.30.xx.xx, server: , request: "POST /some_url HTTP/1.1", upstream: "http://127.0.0.1:8080/some_url", host: "some_host.local"

2018/08/14 07:37:02 [error] 18575#0: *13099 writev() failed (104: Connection reset by peer) while sending request to upstream, client: 10.30.xx.xx, server: , request: "POST /some_url HTTP/1.1", upstream: "http://127.0.0.1:8080/some_url", host: "some_hostname"

The Dropwizard upgrade changed the jetty version from 9.4.7.v20170914 to 9.4.11.v20180605. Jersey client remained the same version: 2.25.1

It seems that there were some "Bad Gateway" before the upgrade, but their number increased significantly. I cannot imagine right now why this upgrade should have cause this, so I'm open for your ideas.

The only significant changes from 9.4.7 to 9.4.11 were with HTTP/2 support. — Joakim Erdfelt, Aug 14 '18 at 18:38
I've seen this before on nginx, for me it was down to rate limiting in the nginx.conf. see https://www.nginx.com/blog/rate-limiting-nginx/ — Matt D, Aug 14 '18 at 18:46
@MattD according to this link, I would expect a 503 error code, and completely different error in the nginx log. — Uziel Sulkies, Aug 14 '18 at 19:04
Without more information, this smells like the server is sending a response quickly (before the request has been fully read), and closing the connection. — Joakim Erdfelt, Aug 14 '18 at 19:51
@JoakimErdfelt which additional logging / info can help us know that for sure? And what can be the root cause for this server behaviour? — Uziel Sulkies, Aug 15 '18 at 05:10
I found this issue on jetty github. I suspect it is concerned with my problem: https://github.com/eclipse/jetty.project/issues/2791 — Uziel Sulkies, Aug 15 '18 at 09:59

score 0 · Answer 1 · answered Aug 29 '18 at 19:08

The problem is discussed here: https://github.com/dropwizard/dropwizard/issues/2461

The suggested solutions are:

Use a stronger machine instance in EC2.
Manually increase the value of the configuration parameter "acceptQueueSize" to a higher value than the OS default (suggested: 256)

Occasionally Bad Gateway (502) from nginx and jetty on aws after upgrading dropwizard

1 Answers1