In the last couple of days, a Centos 6.7 mailserver on which I am monitoring is cotinuously timing out on snmp queries. No changes have been made to the server in the period immediately preceding the change of behaviour (I know...), so I am inclined to blame "something" in the environment.
If I restart the daemon, it will be responsive again for a few minutes (up to a couple of hours) then it will start timing out again. This will also happen for queries run from the machine itself as in
# snmpstatus -v1 -c public localhost
(so no network issues in the picture). I have nothing of note in dmesg and the only things I can see in /var/log/messages - that are not ordinary snmp connect traces - are occasional:
Mar 22 17:34:53 turnip snmpd[31053]: read:Interrupted system call
lines which appear related to me restarting the daemon.
I tried to strace snmpd and I can see it waiting in what appears to be a select/receive loop - when unresponsive, it never gets out of there and it does not write anything in the logs - it is as if packets are not delivered to the daemon. But rebooting the machine has no effect.
Also ineffective has been trying to tweak open files limit and surveying other possible resource limits - not to mention the machine itself is not particularly stressed. So I am currently out of clues.
I can post snmpd.conf if needed.
TIA & cheers
Edit: this is what the traced loop looks like (while unresponsive):
select(15, [14], NULL, NULL, {0, 27618}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)
select(15, [14], NULL, NULL, {1, 0}) = 0 (Timeout)