11

I have a service which is deployed on Amazon Web Services (AWS), specifically 2 instances behind an Elastic Load Balancer (ELB). Availability zones are selected as all three us-west-2a,b,c but only 2 of the above 3 zones have instances running in it.

The issues is that even though the traffic/load is not too high but I still get HTTP 504 errors from ELB often enough.

The log lines reads like this

-1 -1 -1 504 0 0 0

In order, --request_processing_time --backend_processing_time --response_processing_time --elb_status_code --backend_status_code --received_bytes --sent_bytes. Description of what each field and response means can be found here

ELB idle timeout is 60 seconds. KeepAlive is 'On' on backend instances. Latency of requests from ELB are in check. I have tried increasing KeepAliveTimeout but to no avail.

Does anyone have any idea about how to proceed? I don't even know the root cause of this issue.

PS: More like a second question, there are a few cases (much less than 504 being returned by ELB when backend does not even accept the request) where even backend is returning a 504 and then ELB is forwarding the same to client. To the best of my knowledge HTTP 504 should be returned by a proxy only when backend is timing out. How can a server itself return a 504?

Harshdeep
  • 5,614
  • 10
  • 37
  • 45
  • Can you share the cloudwatch metrics available on the ELB? Also can you share what kind of ELB healthcheck you have in place and how many instances are available on the ELB? – Shibashis May 17 '16 at 16:50
  • Do you have `MaxRequestsPerChild` configured in your _e.g._ Apache instance? If the ELB's persistent connection to a backend instance is terminated just as the ELB is using that connection for a request from the frontend client, this too can result in an HTTP 504 seen by the client. – Castaglia May 17 '16 at 17:31
  • @Shibashis 2 instances are attached to ELB as mentioned in the question. Detailed monitoring is enabled which enlists Backend_5xx, ELB_5xx, Latency, HealthyHosts, RequestCount and such. Healthcheck is a simple HTTP call with timeout of 5 secs and frequency of 10 secs. – Harshdeep May 17 '16 at 17:36
  • @Castaglia I am using apache 2.4 with default configuration for keep-alive related stuff, MaxKeepAliveRequests 100, KeepAliveTimeout of 5 secs. – Harshdeep May 17 '16 at 17:38
  • Does the healthy host count drop? or does it remain constant at 2. – Shibashis May 17 '16 at 17:38
  • @Harshdeep you might try tuning that `MaxKeepAliveRequests` number higher (_e.g._ 1000), and see if that affects the frequency of HTTP 504s. – Castaglia May 17 '16 at 17:40
  • @Shibashis It remains constant. – Harshdeep May 17 '16 at 17:43
  • So the connection from elb to instance is fine, the issue may be the way apache has been configured – Shibashis May 17 '16 at 17:45
  • @Shibashis can you please elaborate. It's a very generic apache setup. I haven't done anything fancy so would want to know what may be potential pitfalls of using a generic config. – Harshdeep May 17 '16 at 17:51
  • I am not sure what's the issue with Apache .. You may have to run diagnostic by continuously executing request on the instance directly and see how it goes. – Shibashis May 17 '16 at 18:03

2 Answers2

7

So that it might assist others in future, I am publishing my finding(s) here:

1) This 504 0 HTTP errors were mainly because of logrotate reloading apache instead of graceful restart. The current AWS config does the following

/sbin/service httpd reload > /dev/null 2>/dev/null || true

so replace the service command with either apachectl -k graceful or /sbin/service httpd graceful

File location on my ec2 instance: /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

2) Because logrotate frequency was too high by default in AWS (once every hour), at least for my use case, and that in turn was reloading apache every hour, so I reduced that as well.

Harshdeep
  • 5,614
  • 10
  • 37
  • 45
  • So how did you change the default Beanstalk config to use `apachectl -k graceful`? –  Oct 28 '16 at 13:49
  • 2
    @MaartenSander using elasticbeanstalk container_commands in ebextensions, something like this `command: sed -i 's/reload/graceful/g' /etc/logrotate.d/logrotate.elasticbeanstalk.httpd.conf`. This will happen with each deployment so each machine which comes in as a result of autoscaling has the same properties. Also if you upgrade your EBS environment then file path may change from `/etc/logrotate.d/logrotate.elasticbeanstalk.httpd.conf` so you have to be wary of that whenever you upgrade EBS version. – Harshdeep Oct 31 '16 at 06:22
0

When backend connection timeout, ELB will put -1 to backend_processing_time column in its access log. Think what's happening is that some of your requests take longer than 60s for your backend to process. To confirm this, can you check your latency metrics? Please switch to maximum when viewing this metric. It will confirm my guess if you see latency frequently reaches 60s.

After it got confirmed, you might want to increase Idle timeout of your ELB and backend.

James Gan
  • 6,988
  • 4
  • 28
  • 36
  • as mentioned in the question, latency of the requests are in check. In the latest occurrence I see maximum latency for the ELB to be only 3 seconds. – Harshdeep May 21 '16 at 21:12