2

We're facing a really weird error using HTTP DELETE, the symptom is occasionally the user gets a 504 (timeout) error in the browser.

The requests flow through the steps described below:

Browser -> Akamai -> AWS ELB -> Nginx -> AWS Application ELB -> Application

We've drilled down the request. We've identified that when the error occurs the request appears in Nginx's access.log but not on AWS Application ELB's access log. So Nginx is the one returning timeout, it waits 60 seconds and then returns 408. Looking at the access log and the debug log it looks like Nginx is proxying the request for the application, but the request doesn't get across.

Going further the failed request doesn't appear in the TCP Dump on Nginx server either.

Some facts we've gathered:

  • The error doesn't happen in Safari, but occasionally happens in Chrome and Firefox,
  • In firefox if we set network.http.max-connections-per-server 10 the problem disappears. Any value higher than that (>10) it strikes back.
  • Disabling HTTP 2 in Akamai reduces the number of occurrence of the problem.
  • It seems the problem only happens with DELETE HTTP verb.
  • We've tried to point Nginx directly to an application instance (skipping ELB) the problem persists.

It looks like there is some problem with the management of persistent connection in our stack. However our setting seems correct, e.g. keep alive timeout is set correct, 300 seconds in Akamai, 302 in the first ELB, 304 in Nginx and so on.

I'm also attaching the Nginx debug log of the request, for any one interested.

The failed request in this is case is: DELETE /api/v2/cart/ULBlIlptUun70M3h4cPm1t7Paos/line/122555881 HTTP/1.1

Debug log

Thanks!

0 Answers0