10

We've recently been hitting our maxclients (300) on our server and are noticing a lot of connections are stuck in the R reading state. Here's an excerpt from our server-status dump:

Current Time: Monday, 29-Apr-2013 11:46:00 PDT
Restart Time: Monday, 29-Apr-2013 11:03:48 PDT
Parent Server Generation: 0
Server uptime: 42 minutes 12 seconds
Total accesses: 144170 - Total Traffic: 29.8 MB
CPU Usage: u188.25 s345.65 cu2601.11 cs0 - 124% CPU load
56.9 requests/sec - 12.0 kB/second - 216 B/request
244 requests currently being processed, 56 idle workers
KRR_RRRRRRRRRRRR_RRRKRRRR.......................................
K_RRK_RK_RKR_RRR_RRRR_R_R.......................................
RRRRRRRRRRRRRC_CR___RCRR_.......................................
R_RRRCRRRRRRRRCRRRCRRKRCR.......................................
R_K_RKR_CRRRRRRKC_RR_KR_R.......................................
RRRRKRR__RRRRKRRCR_RRRRRR.......................................
_RCRRRRRRRRRKR__RR_RRCRRR.......................................
RRRRRRRRCRRRRRRKRRRR_RCRR.......................................
R___RRCR__RR__RRRKC___RK_.......................................
_R_KCR__RR__R_R_RRRRRR_RK.......................................
RRR_RR___RCRRK_R_RWRR_RRC.......................................
RRKRCRRRRRRRRKRRRRRRRRCRR.......................................
................................................................
................................................................
................................................................
................................................................


Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

Srv PID Acc M   CPU SS  Req Conn    Child   Slot    Client  VHost   Request
0-0 10320   1/718/718   K   399.14  0   42  0.0 0.21    0.21    69.171.178.24   mysite.com  GET /home.php
0-0 10320   0/434/434   R   398.21  8   40  0.0 0.07    0.07    ?   ?   ..reading..
0-0 10320   0/548/548   R   331.52  558 45  0.0 0.12    0.12    ?   ?   ..reading..
0-0 10320   0/460/460   _   399.05  0   42  0.0 0.08    0.08    98.129.101.123  mysite.com  GET /home.php
0-0 10320   0/473/473   R   364.89  301 48  0.0 0.09    0.09    ?   ?   ..reading..
0-0 10320   0/616/616   R   396.18  26  46  0.0 0.11    0.11    ?   ?   ..reading..
0-0 10320   0/494/494   R   378.84  127 48  0.0 0.10    0.10    ?   ?   ..reading..
0-0 10320   0/231/231   R   311.62  687 43  0.0 0.04    0.04    ?   ?   ..reading..
0-0 10320   0/376/376   R   391.17  76  53  0.0 0.08    0.08    ?   ?   ..reading..
0-0 10320   0/377/377   R   380.72  176 53  0.0 0.07    0.07    ?   ?   ..reading..
0-0 10320   0/589/589   R   355.07  382 47  0.0 0.10    0.10    ?   ?   ..reading..
0-0 10320   0/587/587   R   393.16  56  50  0.0 0.13    0.13    ?   ?   ..reading..
0-0 10320   0/561/561   R   393.89  49  40  0.0 0.10    0.10    ?   ?   ..reading..
0-0 10320   0/616/616   R   389.88  81  41  0.0 0.22    0.22    ?   ?   ..reading..
0-0 10320   0/354/354   R   392.12  59  42  0.0 0.06    0.06    ?   ?   ..reading..
0-0 10320   0/346/346   R   377.51  192 44  0.0 0.06    0.06    ?   ?   ..reading..

Why are there so many in that R state? And does the SS column really mean that they have been reading for 558, 301, etc seconds?

Here's some of my server info and settings:

Server version: Apache/2.2.24 (Unix)
Server built:   Mar 22 2013 10:57:57
Cpanel::Easy::Apache v3.18.9 rev9999
Server's Module Magic Number: 20051115:31 Server loaded:  APR 1.4.6, APR-Util 1.4.1 Compiled using: APR 1.4.6, APR-Util 1.4.1
Architecture:   64-bit
Server MPM:     Worker
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APACHE_MPM_DIR="server/mpm/worker"
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)  -D APR_USE_SYSVSEM_SERIALIZE  -D APR_USE_PTHREAD_SERIALIZE  -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT  -D APR_HAS_OTHER_CHILD  -D AP_HAVE_RELIABLE_PIPED_LOGS  -D DYNAMIC_MODULE_LIMIT=128  -D HTTPD_ROOT="/usr/local/apache"
 -D SUEXEC_BIN="/usr/local/apache/bin/suexec"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

StartServers 5
<IfModule prefork.c>
MinSpareServers 5
MaxSpareServers 10
</IfModule>
ServerLimit 300
MaxClients 300
MaxRequestsPerChild 10000
KeepAlive On
KeepAliveTimeout 1
MaxKeepAliveRequests 100

Here are compiled in modules:
  core.c
  mod_authn_file.c
  mod_authn_dbm.c
  mod_authn_dbd.c
  mod_authn_default.c
  mod_authn_alias.c
  mod_authz_host.c
  mod_authz_groupfile.c
  mod_authz_user.c
  mod_authz_dbm.c
  mod_authz_default.c
  mod_auth_basic.c
  mod_auth_digest.c
  mod_file_cache.c
  mod_cache.c
  mod_disk_cache.c
  mod_include.c
  mod_filter.c
  mod_deflate.c
  mod_log_config.c
  mod_log_forensic.c
  mod_logio.c
  mod_env.c
  mod_mime_magic.c
  mod_expires.c
  mod_headers.c
  mod_unique_id.c
  mod_setenvif.c
  mod_version.c
  mod_proxy.c
  mod_proxy_connect.c
  mod_proxy_ftp.c
  mod_proxy_http.c
  mod_proxy_scgi.c
  mod_proxy_ajp.c
  mod_proxy_balancer.c
  mod_ssl.c
  worker.c
  http_core.c
  mod_mime.c
  mod_dav.c
  mod_status.c
  mod_autoindex.c
  mod_asis.c
  mod_info.c
  mod_suexec.c
  mod_cgid.c
  mod_dav_fs.c
  mod_dav_lock.c
  mod_negotiation.c
  mod_dir.c
  mod_actions.c
  mod_speling.c
  mod_userdir.c
  mod_alias.c
  mod_rewrite.c
  mod_so.c

When we restart the reading requests slowly start building back up every time. Thanks for your help in advance!

UPDATE: here's netstat -pn:

tcp        0      0 198.57.162.52:80            54.235.31.205:41209         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            166.147.121.170:63246       TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            50.128.171.238:50751        FIN_WAIT2   -                   
tcp        0      0 198.57.162.52:80            89.241.135.84:24825         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            205.142.167.225:33256       TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            76.127.87.244:56493         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            205.153.156.222:21571       TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            108.12.147.14:61135         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:80            166.217.157.32:1630         TIME_WAIT   -                   
tcp        0      0 198.57.162.52:443           71.22.102.71:4252           TIME_WAIT   -               

Sample lsof output:

httpd      4092   nobody   25u     IPv4          711278095         0t0        TCP 198-57-162-52.unifiedlayer.com:http->4.sub-70-193-66.myvzw.com:12471 (ESTABLISHED)
httpd      4092   nobody   26u     IPv4          711350284         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c75-111-15-253.amrlcmta01.tx.dh.suddenlink.net:51298 (ESTABLISHED)
httpd      4092   nobody   27u     IPv4          711400163         0t0        TCP 198-57-162-52.unifiedlayer.com:http->108-235-145-129.lightspeed.irvnca.sbcglobal.net:56037 (FIN_WAIT2)
httpd      4092   nobody   28u     IPv4          711390092         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-98-239-167-37.hsd1.pa.comcast.net:57988 (ESTABLISHED)
httpd      4092   nobody   29u     IPv4          711206563         0t0        TCP 198-57-162-52.unifiedlayer.com:http->252.sub-70-199-129.myvzw.com:6179 (ESTABLISHED)
httpd      4092   nobody   30u     IPv4          711271036         0t0        TCP 198-57-162-52.unifiedlayer.com:https->5.71.20.78:62114 (ESTABLISHED)
httpd      4092   nobody   31u     IPv4          711385050         0t0        TCP 198-57-162-52.unifiedlayer.com:http->64.128.135.115:17547 (ESTABLISHED)
httpd      4092   nobody   32u     IPv4          711400374         0t0        TCP 198-57-162-52.unifiedlayer.com:https->mbb5336d0.tmodns.net:36277 (ESTABLISHED)
httpd      4092   nobody   33u     IPv4          711378965         0t0        TCP 198-57-162-52.unifiedlayer.com:http->us-chi-exchange.brightstarcorp.com:47707 (ESTABLISHED)
httpd      4092   nobody   34u     IPv4          711394997         0t0        TCP 198-57-162-52.unifiedlayer.com:https->209-152-137-170.dsl.mcmo.net:64370 (ESTABLISHED)
httpd      4092   nobody   35u     IPv4          711391122         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-76-125-136-230.hsd1.oh.comcast.net:57093 (ESTABLISHED)
httpd      4092   nobody   36u     IPv4          711398994         0t0        TCP 198-57-162-52.unifiedlayer.com:http->lookup.eatright.org:60021 (ESTABLISHED)
httpd      4092   nobody   37u     IPv4          711400972         0t0        TCP 198-57-162-52.unifiedlayer.com:http->modemcable126.224-130-66.mc.videotron.ca:55970 (ESTABLISHED)
httpd      4092   nobody   38u     IPv4          711379727         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-75-65-146-83.hsd1.ms.comcast.net:50309 (ESTABLISHED)
httpd      4092   nobody   39u     IPv4          711390427         0t0        TCP 198-57-162-52.unifiedlayer.com:https->66-87-130-177.pools.spcsdns.net:41127 (ESTABLISHED)
httpd      4092   nobody   40u     IPv4          711401179         0t0        TCP 198-57-162-52.unifiedlayer.com:http->8.225.196.132:20608 (ESTABLISHED)
httpd      4092   nobody   41u     IPv4          711142701         0t0        TCP 198-57-162-52.unifiedlayer.com:http->c-24-15-33-77.hsd1.il.comcast.net:54631 (ESTABLISHED)
Shane N
  • 153
  • 1
  • 1
  • 9
  • 1
    What does netstat -pn show? – Zabuzzman Apr 29 '13 at 19:09
  • It shows a bunch of requests in the TIME_WAIT state - I've updated with excerpt above. – Shane N Apr 29 '13 at 19:22
  • The high values for SS indicate that the connections in TIME_WAIT have nothing to do with the workers in 'R' state. – Zabuzzman Apr 29 '13 at 19:37
  • What do the apache (error)-logs show? – Zabuzzman Apr 29 '13 at 19:38
  • In the process table of this server, do you see as much (php)-processes as 'R'-state workers? "ps auxww | grep php" – Zabuzzman Apr 29 '13 at 19:39
  • Apache error logs don't show anything except maybe an occasional "server reached MaxClients setting, consider raising the MaxClients setting" notice. – Shane N Apr 29 '13 at 19:47
  • Server process list shows only a few (2-5) apache processes, which match up to the connections in the W state in apache status. So how are those 'R' requests registering in apache if there hasn't been a thread spawned for them yet? – Shane N Apr 29 '13 at 19:50
  • Can you elaborate more on your comment about high SS values and TIME_WAIT?? – Shane N Apr 29 '13 at 19:50
  • To see all threads you need to see something like "ps -ALf" (assuming ps from procps package). Did you use that? – Zabuzzman Apr 29 '13 at 19:57
  • A TIME_WAIT connection will disappear after 4 min at most. – Zabuzzman Apr 29 '13 at 20:00
  • I used 'ps auxww' as you had suggested. Running with what I'm familiar with 'ps ax' produced the same results – Shane N Apr 29 '13 at 20:02
  • "ps aux" only shows processes (as I was considering a php problem), whereas "ps -ALf" also shows the threads within processes (as mpm_worker is multi-threaded) – Zabuzzman Apr 29 '13 at 20:16
  • Maybe with lsof you can find out where all these 'R' workers are waiting for? – Zabuzzman Apr 29 '13 at 20:18
  • Ran lsof, sample output above - I see about as many R requested in the ESTABLISHED state - does that indicate anything? – Shane N Apr 29 '13 at 20:34
  • Also, our ulimit is at 1024 - could that be why connections are waiting? – Shane N Apr 29 '13 at 20:38
  • With "ps -ALf", same results - only a few apache processes listed. – Shane N Apr 29 '13 at 20:38
  • Is `KeepAlive` on, and what is your `KeepAliveTimeout`? What are your `TimeOut` and `LimitRequestBody` values? Note that the default `TimeOut` is 300, which is pretty high - 60 is probably a better value. `LimitRequestBody` default is 0, which means unlimited request body size -- something smaller probably makes more sense, eg, a few MB's, or even 128 or 64kb if you don't use file uploads. – gregmac May 01 '13 at 19:53
  • We turned off KeepAlive yesterday and are still seeing the same R's. Our TimeOut is high - 900 - but don't understand why that would cause the reading state. I'll check LimitRequestBody and get back to you. Thanks! – Shane N May 01 '13 at 20:07
  • LimitRequestBody had not been set. We put it at 1MB and problem still exists. Any other suggestions?? – Shane N May 02 '13 at 17:52
  • For what it's worth, we began to experience a similar server-status output showing a dramatic increase in R's starting Saturday, February 22nd which appears to be caused by a change in the new major version of Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=1055256&q=timeout%20408&can=1 – jake_feyereisen Feb 26 '20 at 20:33

2 Answers2

17

@ShaneN Your server is reaching choke of resources point-of-failure. Maybe you're getting without ephemeral port range, or without system file-descriptors, which limits the number of sockets your server can use to answer requests. A reading state request means that the socket was opened, but the request line never reached the server, it's a known form of DoS called SlowLoris. To mitigate this kind of DoS, I'd first:

Add to /etc/sysctl.conf, the following entries:

net.ipv4.tcp_fin_timeout = 10 which sets the timeout of FIN sockets to 10seconds.

net.ipv4.ip_local_port_range = "15000 61000" which sets the ephemeral port range to about 46000 possible open ports per IP.

net.core.somaxconn = 1024 - default 128 - The net.core.somaxconn value has an important role. It limits the maximum number of requests queued to a listen socket.

net.core.netdev_max_backlog = 2000 - default 1000 - which increases the TCP queue of incoming requests.

After adding those lines to the file, issue sysctl -p to load the kernel parameters at the current runtime and check your rush hour to see if your problem's still there.

Marcel
  • 1,730
  • 10
  • 15
  • 1
    Wow, great instructions Marcel - will try this and let you know how it works... – Shane N May 06 '13 at 22:19
  • 4
    Marcel - thanks for your answer. We tried the steps above, and it did not solve the problem. But the SlowLoris info led us to an apache plugin Mod ReqTimeout (https://httpd.apache.org/docs/2.4/mod/mod_reqtimeout.html) that allowed us to set timeouts on the requests. Problem appears to be solved! Thank you! – Shane N May 08 '13 at 21:55
  • Nice that you've sort things out! But hey, you don't necessarily need this mod if you tweak the timeouts variables that are already available for you. IMO just tweaking your timeouts in KeepAlive settings (which is used by SlowLoris to thrash servers) and global timeout vars should do the trick! ;) – Marcel May 09 '13 at 00:13
0

This may be a denial-of-service attack meant to occupy listening sockets and tie up your server. If you can determine that to be the case, I'd suggest trying a frontend proxy such as nginx to take the stress off of Apache. Proxies like this can handle a massive amount of connections and will at least get you some breathing room to figure out what's going on.

kenchilada
  • 439
  • 3
  • 5
  • We pointed our domain name to another server and the same behaviour is happening. Do most DOS attacks happen via IP or domain? – Shane N May 02 '13 at 17:53
  • DOS attacks on L7 are more effective today - 50/50, Scans - IP – GioMac May 05 '13 at 03:18