I usually have Nagios agents installed on all our Linux servers so we get detailed report of what's happening on them in real-time and we also have historical data.
However there is one RHEL 7 server on which we can't install a Nagios agent(or monitor it over ssh etc) and on this server the load average is going up once every few days. This is a web server and we find out when users complain the site is loading slow. In most cases by the time we login and check the load is back to normal.
Is there any way, using the readily available OS tools and logs, I can find out what caused the load to shoot up?
I have gone through pretty much all log files including Apache logs etc, but I can't find anything obvious in them.
Are there any tools or daemons that could give me more information about such incidents?