-1

I am monitoring multiple server/router with snmp. Everything was working but today I saw that 3 server was not responding anymore via SNMP. The 3 snmp daemon stoped at the same moment (Saturday 6 AM) with the same last log (Cannot statfs : /var/docker/lib.....)

I tried to restart the snmp daemon but systemctl catch a timeout and can't restart them. Nothing was change in the configuration.

Anyone has an idea ?

Thanks

rebug
  • 3
  • 4

1 Answers1

1

"Cannot statfs" likely comes from the disk usage monitor in snmpd, that iterates the mounted file systems and asks for the amount of free space left.

If a statfs(2) call fails, that is a serious problem on the machine, this is one of the syscalls that basically just look up information in a shared structure and return it, the only way this can fail is in synchronizing access to that structure.

So, something is hanging there that is holding exclusive access to some structure in the kernel, and that is also what blocks file system access, which causes the restart timeout.

If this is a local file system, I'd reboot and force a file system check during boot. Before systemd, the mechanism to do that would have been shutdown -Fr now, but systemd requires you to set a kernel commandline parameter.

If this is on a SAN or similar, I'd find out what's wrong with the SAN first, and then do a file system check.

Three hosts at the same time can really be explained only by "this file system is on a SAN that failed."

Simon Richter
  • 3,317
  • 19
  • 19
  • Thanks for the reply. The 3 hosts was sharing some mounts each others. There is some problem with some path unmounted that we can't mount anymore, but the 3 hosts can communicate. We can't figuring out what's exactly is going on. – rebug Apr 19 '21 at 14:25