We have a dedicated machine that mostly serves as a webserver. It is running Plesk for several domains, our webservers and the munin central node that is connecting to around 10 other machines that run munin-node.
Today our server got unresponsive. Any calls to any website or the mail servers would time out. SSH would also timeout and users complained they could not play anymore.
I issued a hard reset via the provider dashboard and after a time everything was back up again. So I checked the syslog: Our monitor services reported the first timeout at 11:36. The last entries in the syslog before that time are these two:
Jul 7 11:30:19 xxx CRON[7666]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Jul 7 11:30:30 xxx CRON[7671]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Could Munin somehow be at fault for the server becoming unresponsive? If so, how could we tackle the issue?