0

I have a website, which queries a Varnish server, which queries an Apache server, which queries a db server.

At 07:00:00, a request is send to the Apache server, which triggers a db request that takes over 30 seconds to process. While the db server is "locked", concurrent db requests are piling up, causing apache requests to pile up as well. So far, this is not my issue.

In the meantime, Varnish polls Apache every 5 second, with a 1 second timeout. The probe target is an empty html file.

Apache log tells me that every poll is answered with a 200 status code.

I get the following results from combined Varnish/Apache log :

Polled at   Served at   Delay (s)
07:00:26    07:00:26            0
07:00:31    07:00:34            3
07:00:37    07:01:01           24
07:00:43    07:01:01           18
07:00:49    07:01:01           12
07:00:55    07:01:01            6
07:01:01    07:01:01            0
07:01:06    07:01:06            0

What I don't understand is the following :

  • Given that Apache serves every polling requests, it should means that the MaxClients has not been reached. Otherwise, I guess Apache would reject any new incoming polling requests. Am I right ?
  • If Apache can accept connections for the polling requests, why is the response delayed ? Serving an empty html file should be as fast as usual, even if many concurrent requests are still waiting for the db to "unlock". The timing looks like Apache needs somehow the db to unlock, and other processes to be served, so it can process the polling request.

The delay causes Varnish to believe that my server is "unhealthy", thus causing automatic rejection of all following requests, while they could all be served within a 30 seconds delay.

Varnish config :

backend foo {
    .timeout = 60s;
    .probe = {
        .url = "/check.html";
        .interval = 5s;
        .timeout = 1s;
        .window = 10;
        .threshold = 8;
    }
}

Apache configuration :

Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers       5
    MaxSpareServers       20
    ServerLimit           200
    MaxClients            200
    MaxRequestsPerChild   0
</IfModule>

Don't hesitate to ask for more configuration informations or logs.

Elvex
  • 217
  • 2
  • 9

1 Answers1

0

Apache does allow queue of pending connections to build up if all its http worker threads ore busy. This is controlled by the ListenBackLog directive:

https://httpd.apache.org/docs/current/mod/mpm_common.html#listenbacklog

So it's possible that the requests are entering this queue when all the other requests backup and that is causing your delay.

I would also enable the /server-status handler and monitor it 'while you server gets busy' as opposed to when it's already busy as Apache wont be able to server the server-status page.

Another trick is to add %D to your access log format as that will tell you the time (in microseconds) Apache took to serve a request from when it first received it to when it completed the request.

Unbeliever
  • 2,336
  • 1
  • 10
  • 19
  • Thanks for your answer. Today, I've got confirmation that the MaxClients is not reached when /check.html fails to respond in time. Does this invalidate your "ListenBackLog directive" hypothsesis ? I'll check your other suggestions as well. – Elvex Aug 27 '16 at 00:26
  • Yes if MaxClients is not reached then the listen backlog shouldn't be in use. If the %D and server-status ideas confirm the request is really taking a long time to process then something strange is going on. You could also try chaning the probe request to a HEAD rather than a GET. – Unbeliever Aug 28 '16 at 11:54