1

Short version, I upgraded from Fedora Server 23 to Fedora Server 24, and now my apache (httpd-2.4.23-4.fc24.x86_64) error_log is filling up with these messages:

[Mon Nov 28 20:30:39.486187 2016] [mpm_worker:crit] [pid 9973:tid 140499117635328] (22)Invalid argument: AH03139: ap_queue_pop failed
[Mon Nov 28 20:30:39.486197 2016] [mpm_worker:crit] [pid 9973:tid 140499117635328] (22)Invalid argument: AH03139: ap_queue_pop failed

(It set off disk space alerts and I pruned error_log when it hit 1.8Gb!)

I am using the MPM worker, with these settings:

ServerLimit         60
MaxRequestWorkers   1500
MinSpareThreads     10
MaxSpareThreads     25
MaxRequestsPerChild 10000

ulimit -a

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 29966
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 29966
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Any advice would be appreciated. :)


Apache Server Status for example.com (via x.x.x.x)

Server Version: Apache/2.4.23 (Fedora) OpenSSL/1.0.2j-fips PHP/5.6.28
Server MPM: worker
Server Built: Jul 18 2016 15:38:14
Current Time: Wednesday, 30-Nov-2016 12:48:45 AST
Restart Time: Tuesday, 29-Nov-2016 18:21:57 AST
Parent Server Config. Generation: 3
Parent Server MPM Generation: 2
Server uptime: 18 hours 26 minutes 48 seconds
Server load: 6.60 6.48 6.33
Total accesses: 2278650 - Total Traffic: 20.6 GB
CPU Usage: u488.03 s82.94 cu4.36 cs14.77 - .889% CPU load
34.3 requests/sec - 325.3 kB/second - 9.5 kB/request
146 requests currently being processed, 79 idle workers
_W_K_KKKCKKKKK_K__KKK___CKK_KKK__K_K___K___CKK_W_K_K__C_KK___K_K
KKCK_CK__CKKWCK__WKKKKKKW_WWKKKKC_____K__K_KKKKKKKWK_K_KK__KK...
......................KKKKKKKK_K_KKC____KKKCKKKKK__KCKKKKC_KC__K
_KKKK_CKK_KK__K_KKKK___RKCCK_K_KK_KC_K_KKK_K__W_K_KK__KKKK......
................................................................
................................................................
................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

EDIT: In the end, I believed the server was overall unstable and scrapped it, building a new one from scratch. I was never able to solve the issue.

Mike Bobbitt
  • 137
  • 2
  • 8
  • 1
    This is just a guess but try reducing the stacksize to 256, `ulimit -s 256`. Also a few configuration tips, if you expect as much as 1500 request workers, increase maxsparethreads to something much bigger, such as 400 or 600, and maxrequestsperchild is certainly low if you have much load threads will probably be respawning very often with your current setup, set maxrequestsperchild to 10 million or to 0 (unlimited) in any case. Try all this and let me know. – Daniel Ferradal Nov 29 '16 at 09:36
  • Thanks, I have tried all those settings and no change, I still see a flood of "Invalid argument: AH03139: ap_queue_pop failed" messages. I added a snippit from server_status to the original post, but I'm not sure that adds any insight. There are also pretty regular segfaults being reported - I missed them originally in the flood of other messages. I have CoreDumpDirectory set but no core dumps seem to be appearing. – Mike Bobbitt Nov 30 '16 at 16:59
  • Scratch the above... I saw some evidence of a kernel problem so I reinstalled (the kernel) and now am seeing different messages: [Wed Nov 30 13:14:44.863570 2016] [mpm_worker:alert] [pid 7949:tid 139846088509632] (11)Resource temporarily unavailable: AH00282: apr_thread_create: unable to create worker thread [Wed Nov 30 13:14:47.822725 2016] [core:notice] [pid 356:tid 139846088509632] AH00051: child pid 5238 exit signal Segmentation fault (11), possible coredump in /var/www/apache-dumps I'm investigating that issue, but it looks like improvement (I am actually getting the dump files). – Mike Bobbitt Nov 30 '16 at 17:14
  • Spoke too soon, still seeing a spate of `(22)Invalid argument: AH03139: ap_queue_pop failed` messages. – Mike Bobbitt Nov 30 '16 at 23:04

1 Answers1

2

It seems like your system limits are not high enough, causing Apache processes to have issues. You might want to increase the values of max user processes, pending signals and open files. Here some preferable values:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1546671
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 102400
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1546671
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
DevDavid
  • 35
  • 1
  • 8