0

I am having a weird problem currently. We follow ELB + Nginx + gunicorn + django for production.

So for some api's request is taking time from ELB to reach nginx. I dont know how this is possible. ELB and Nginx logs have been attached below.We are using M4.large machines for production. Further there is no surge queue on ELB.

ELB time

2016-11-23T02:31:55.756089Z 52.74.181.254:44708 172.31.18.32:80 0.000046

11.566721 0.000009 200 200 0 1219 GET

Nginx time

23/Nov/2016:02:32:08 +0000] "GET" 200 1231 "-"

0.664 0.664

Logs are of same request. Kindly help me if anyone has faced the same problem.

Thanks.

  • It seems like there exists the possibility that *something* happens extremely early in the connection acceptance process that causes the Nginx timestamp to reflect an acceptance time later than the connection actually arrived at the instance -- something fairly obscure like a reverse DNS lookup timeout or a socket backlog -- which would mean the delay is not "between" ELB and Nginx but actually on the instance. Discard your assumptions until you can prove the timestamp's meaningfulness. Packet capturing to a file, with `tshark` on the Nginx server will help establish this more authoritatively. – Michael - sqlbot Nov 24 '16 at 16:58
  • @Michael-sqlbot timestamp of gunicorn and nginx match, So nginx time is correct(timestamp denotes requests leaving time) – Alankar Choudhary Nov 24 '16 at 19:26
  • @Michael-sqlbot and i have checked for many requests, timestamp doesn't seem to be a problem – Alankar Choudhary Nov 24 '16 at 19:30
  • The timestamps of Nginx and gunicorn match, of course, but that doesn't prove anything because they are both *after* the suspected problem. Again, I say, you need a packet trace. You cannot troubleshoot this without using the correct tools. ELB says Nginx is taking >11 seconds, Nginx says < 1 second. There is nothing between them, so at least one of them is *necessarily* wrong, and a packet trace is the only way to conclusively understand which side is making a measurement error and incorrectly timing the event. Until then, you cannot trust the accuracy of *either* the Nginx *or* ELB log. – Michael - sqlbot Nov 24 '16 at 21:18
  • @Michael-sqlbot from aws documentation "each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses" and from nginx docs "request_time This shows how long Nginx dealt with the request upstream_response_time Gives us the time it took our upstream server" so timestamps are obviously correct, the thing is request is taking time from elb to reach nginx – Alankar Choudhary Nov 25 '16 at 07:28
  • You are continuing to labor under a flawed premise. The first ELB timer, `request_processing_time` is *"The total time elapsed, in seconds, from the time the load balancer received the request until the time it sent it to a registered instance."* **This occurs over a TCP connection from ELB to instance with nothing in between to delay it, and ELB claims it was sent in 0.000046s.** Your task is to disprove this assertion and the Nginx logs are not sufficient for the task. Packet tracing is your best source of truth here and I fail to understand why you resist this troubleshooting route. – Michael - sqlbot Nov 25 '16 at 12:16
  • @Michael-sqlbot Thanks for the time, now i understand the problem. – Alankar Choudhary Nov 25 '16 at 13:12

0 Answers0