Problem
Regular timeouts from the memcached backends in Nginx (see example errors below)
Things already tested:
- it occurs both on local and remote memcached servers so the network can't be the problem
- it occurs both with the memc and memcached backends
- it occurs both with and without the upstream keepalive module
- it occurs in nginx 0.8, 1.0.x and 1.1.x
- buffering logs appear to slightly decrease the amount of errors, but they are still there and this could simply be coincidence
- logging through udp (i.e. non-blocking) to another makes no difference
Other info:
- the memcached server handles ~500 requests per second from both this and other (remote) clients
- there are no errors/reports in the memcached logs
- memcached is started with the following settings:
- both tcp and udp are enabled (only tcp is used currently)
- 16GB of memory
- 16 threads (on a 32 core machine)
- limited to 8192 connections
- started in non-daemonic mode through supervisord
- The servers run FreeBSD 8.2
- The timeouts occur both during low and high loads (right now only about 50 request per second)
The question
So my question is... how can I debug these problems? Anyone has an idea where the underlying cause could be? The errors don't occur continuously but frequently enough to show up in the logs every couple of minutes.
Example errors from the logs:
2011/08/30 17:23:34 [error] 13921#0: *38602 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:34 [error] 13921#0: *38591 upstream timed out (60: Operation timed out) while reading response header from u`enter code here`pstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:34 [error] 13921#0: *38601 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:34 [error] 13921#0: *38918 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:35 [error] 13921#0: *38595 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:24:32 [error] 13921#0: *41768 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:24:36 [error] 13921#0: *38599 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:25:12 [error] 13921#0: *42489 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:25:30 [error] 13922#0: *39444 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:25:30 [error] 13922#0: *39452 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
Output from netstat -m
# netstat -m
3404/25531/28935 mbufs in use (current/cache/total)
3318/25004/28322/229376 mbuf clusters in use (current/cache/total/max)
1161/20599 mbuf+clusters out of packet secondary zone in use (current/cache)
26/4420/4446/192000 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
7591K/74070K/81661K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
4925075 requests for I/O initiated by sendfile
0 calls to protocol drain routines