Tracing / Solving a sudden spike in Apache2

Question

My server is running on Ubuntu20.04, a pure LAMP stack with Apache 2.4.41. In the last few weeks, there was a total of 2 occurrence where Apache2 was not responsive (users can't load our website), and we can't solve why, but it started working again after I restarted Apache2 (systemctl restart apache2). I checked and MySQL is up, so I feel it's purely due to Apache2 reaching the limit and being unresponsive.

So I started tracing around, and logging the processes count, namely, logging down the command below

ps aux | grep apache | wc -l

into a text file every 5 seconds.

The command will return the number of processes that has the word "apache", which serves a purpose to tell us the amount of active processes currently.

The usual process counts would range from 90 (off peak) to 250-300 (peak). But occasionally (twice now, since we started logging), it goes up to 700, the trend will be from 90 > 180 > 400 > 700, nearly doubling every 5 seconds.

I have checked apache error logs, syslogs, access logs and so on, and failed to find any useful informations. Initially I suspects it to be a DDOS, but i fail to find any useful information to "prove" that it is DDOS.

Little info about my server configs -

uses the default mpm_prefork
MaxKeepAliveRequest 100
KeepAliveTimeout 5
ServerLimit 1000
MaxRequestWorkers 1000 (increased recently to "solve" the spike, it was 600 previously)
MaxConnectionsPerChild 0
MaxSpareServers 10
No firewall (ufw) or mod_evasive enabled.

Here comes my questions,

Is there any way I can find out what is causing the spike, if there's no logs at all? I feel that it's due to certain apache processes getting stuck and kept on spawning child processes, if that's how it works (sorry, not very familiar with server stuffs).
I noticed that, after a spike, the number of processes doesn't goes down immediately, instead, it seemed like it decreases by 3-5 processes every 5 seconds, and took around 9-10 minutes to reach 100 processes, from 700 processes, not sure what was the reason, but which config should I tweak to make the processes "die" faster? I was hoping that, if the processes "die" fast enough, even if there is a sudden spike, my server will just be "down" for around 5-10seconds max. But upon reading some stuffs, my setting of KeepAliveTimeout 5 should kill it fast enough, why is it lingering for up to 10 minutes? Should i set MaxConnectionsPerChild to something other than 0 (unlimited)?

My current approach is hopefully to find ways to implement #2 and ways to "prove" that processes are dying faster than it used to be, during a spike. Secondly, maybe implement a firewall to prevent a DDOS, if it really is one.

Thanks

You don't mention what [mpm](https://serverfault.com/q/383526/37681) you're using, but be aware that the typical default, prefork, isn't the best when you get many concurrent requests — HBruijn, May 12 '23 at 09:36
@HBruijn Sorry missed that out, but yes, I am using the default mpm_prefork.conf, I did not do much configurations on this server, as mainly I just used it as given by aws and to host my website. I just tweak as and when needed (like now) — Patrick Teng, May 12 '23 at 09:44

Tracing / Solving a sudden spike in Apache2

0 Answers0