There are short term and medium term ways to troubleshoot this problem.
In addition, if you want useful help from this site, I would suggest providing more information. Your site configuration files, logfiles, and the error messages you saw when running commands. e.g. cut and paste the text of the output from the graceful restart command
Short Term
The best way to troubleshoot a problem that has happened previously, and is not happening now, is through logfiles.
The main apache2 error logfile is at /var/log/apache2/error.log
and you may have a VirtualHost specific error log configured;
# grep ErrorLog /etc/apache2/sites-enabled/*.conf
/etc/apache2/sites-enabled/mysite1.org.conf: ErrorLog /var/log/apache2/mysite1.org-error.log
/etc/apache2/sites-enabled/mysite2.org.conf: ErrorLog /var/log/apache2/mysite2.org-error.log
Errors relating to service restarts will be logged to the journal;
# journalctl -u apache2
-- Logs begin at Sun 2018-07-08 01:35:01 UTC, end at Mon 2018-07-09 21:39:06 UTC. --
Jul 08 06:25:01 devhost1 systemd[1]: Reloading LSB: Apache2 web server.
Jul 08 06:25:01 devhost1 apache2[10537]: * Reloading Apache httpd web server apache2
Jul 08 06:25:02 devhost1 apache2[10537]: *
Jul 08 06:25:02 devhost1 apache2[2313]: DIGEST-MD5 common mech free
Jul 08 06:25:02 devhost1 systemd[1]: Reloaded LSB: Apache2 web server.
To look at a particular period of time, use --since
and --until
# journalctl -u apache2 --since "2018-07-06 10:30:01" --until "2018-07-06 11:30:01"
Medium Term
Your description suggests some sort of resource exhaustion problem, which accumulates over time. So either memory, file descriptors, or potentially apache is unable to serve the requests due to lack of cpu, io, etc and they queue and timeout.
So generally its useful to track these values, using some tool installed on the box. Personally I would use munin, because I am familiar with it, but it's quite old but it will do the trick.
Another tool to track cpu, io, memory, is the sysstat package, which will log useful system statistics, which you can compare to your downtime periods.