HAProxy backend requests time out with high waiting time for the server

Question

We started using HAProxy recently and everything works fine so far. However, we are seeing "randomly" some errors on the backend server (Debian + Apache + FCGID + PHP 7) like this:

[Wed Jan 31 08:22:22 2018] [warn] [client XXXX:XXXX::1] (70007)The timeout specified has expired: mod_fcgid: can't get data from http client, referer: XXX

And on the HAProxy server this results in the following log:

Jan 31 08:22:22 localhost haproxy[4029466]: 127.0.0.1:41408 [31/Jan/2018:08:05:12.587] http-in server/myservername 0/0/4/1029610/1029615 500 569 - - CD-- 64/64/0/0/0 0/0 "POST /file.php HTTP/1.1"

As you can see the Tr and Ta time is quite high (1000s) for these requests.

We are not seeing any other errors in the log and are wondering what can cause this and how to debug it further? For some reason the backend server seems to not retrieve the data from HAProxy? We aren't able to reproduce this and it happens like only once an hour or so. We have a few servers in the backend and the HAProxy server as well as the backend servers are quite bored (load on powerful HAProxy server is < 0.1).

Here is a part of our HAProxy config (we set the timeouts quite high on purpose):

mode http
log global
timeout connect 10s 
timeout client 600s
timeout server 1200s
timeout check 90s
option log-health-checks
option httplog
option log-separate-errors
default-server inter 90s rise 3 fall 3
option httpchk GET check.php

Everything else is quite standard. I should note that we are seeing this only for some POST requests so far, not once for a GET request.

Update: The problem does not occur when we don't use HAProxy and send the traffic directly to a server. So it must be related to HAProxy somehow.

Some of the POST request we may be quite large. Is it possible that we maybe need to adjust some buffer tuning or so? We have logged the content length of such a request using capture request header Content-Length len 10 and the length for such a request that fails is 1084028 (should be > 1MB).

Your client and server timeouts are extremely large. Typical values are 60s or less. Why are these so high? What version of HAProxy? (`haproxy -v`) — Michael - sqlbot, Feb 01 '18 at 02:25
HA-Proxy version 1.7.9-1~bpo8+1 2017/08/24 The timeouts are so large as we weren't quite 100% sure yet what values to set for which timeouts as we have some quite long running requests sometimes (up to 10minutes). And for now to test HAProxy we didn't want to mark any servers as down, this we will only do later. — holger359, Feb 01 '18 at 02:34
I have just been reading about the option http-buffer-request in https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4-option%20http-buffer-request Wondering if some users have maybe a very slow connection and the request never makes it fully to HAProxy but the request was already forward to the server. We consider enabling this option to see if the error goes away. — holger359, Feb 01 '18 at 02:36
We have been analyzing the logs for quite a while and are now thinking that on browser "unload" event we are sending those POST requests that can be between 100KB and 1MB. This is related to some data we are tracking. We do keep the browser busy in a loop for about a second to send this request. However, the request might not finish completely within one second depending on the internet connection and the browser might abort this request to load the new page. Any thoughts whether this could be plausible and whether this is an expected error we would see like this in HAProxy? — holger359, Feb 02 '18 at 05:54
Familiarize yourself with [Session state at disconnection](http://cbonte.github.io/haproxy-dconv/1.7/configuration.html#8.5) -- it is a very valuable tool. In your log entry, that's `CD--` -- this translates to the client aborted the connection, the connection was in the "data" stage. I'm not 100% certain whether "data" indicates request or response, but I *believe* it means the server had already returned complete response headers but the body was not complete (which might also mean offset 0). You should be able to duplicate this error with `curl` and a well-timed Control-C. — Michael - sqlbot, Feb 02 '18 at 12:58

HAProxy backend requests time out with high waiting time for the server

0 Answers0