Apache died last night. Error log shows this
[alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 48
Browsing around the Internet, everyone seems to say this is is an issue with ulimit in Linux. If I understand correctly, ulimit has a default to limit the number of concurrent processes to any non-root user to 1024. If that max number for a user (apache in this case) is reached, it can't create more processes. Thus explaining why it can't setuid to the apache user when it tried to spawn a new child.
Last night, we had a user of our web app make 1100+ GET requests to the same page in the space of about 1 minute and that's when the server died.
My Apache config file has this:
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 2000
MaxClients 2000
MaxRequestsPerChild 4000
</IfModule>
KeepAlive On
MaxKeepAliveTimeout 5
KeepAliveTimeout 5
If my MaxClients is 2000 and the MaxSpareServers is 20, then that means in order for Apache to have hit the ulimit for processes, it would have had to have 1000+ busy processes, which I simply don't see that happening considering these GET requests were tiny and could easily handle 20-30 requests a second. On top of that, MaxRequestsPerChild is set to 4000, so it shouldn't be spawning so many new children, right? So was ulimit really the culprit here on why it couldn't setuid?
I've used Apache's ab
tool to try and reproduce this on a much less powerful local build and I can't. Even the less powerful hardware can handle thousands of connections in one minute with ok performance. It goes slugish if I try to access the server via the web at the same time, but Apache doesn't go belly up.
So here are my questions:
- Is there a better way I configure the prefork module? We just migrated from an older server to this newer server. The older had the ServerLimit and MaxClients set to 512. I figured might as well set it 2000 to avoid having to change it again in the foreseeable future.
- I've tried adjusting ulimit nproc to a LOW number to try and reproduce the error. In limits.conf I've set
* hard nproc 15
, but Apache would still spawn about 30 children when I would runab
. Am I missing something? - Despite number 2, is the solution here to raise the nproc setting to something higher like 10000?
- I would also like to implement something that will prevent users from making so many attempts. In the past I've used
iptables
to block out users who made more than 5 SSH requests in a minute. I thought about doing something similar for http requests, but I wanted to know if there are better solutions out there. I've heard of fail2ban and mod_security and the such, but I'm not sure what they do. Would they benefit me in this case? Any other suggestions?
Server Info: RHEL 6.3, Apache 2.2.15