0

I am working on an application project on Debian Linux which involves software watchdog to monitors other services by PID file created by services.

I am following the steps from http://linux.die.net/man/5/watchdog.conf and installed it by

apt-get install watchdog

The mechanism behind is that watchdog checks these PID files existence those are configured in /etc/watchdog,conf file.

I have tested it by stopping any service by service service-name stop

Watchdog will detect that service is not in running state hence it reboot the system after some seconds equal to watchdog timeout period.

Consider we have a display less product then it would rebooting the system infinite time without any intimation to end user in case of a service's configuration files are corrupted etc.

The practical expectation is that before taking action by watchdog for reboot/halt/soft-restart I am want to know the status of watchdog so that programmer can implement intimation logic for end user.

Otherwise can it possible to modify watchdog init script in /etc/init.d/ to call user program on stopping the software watchdog so that programmer will able to maintain a counter in non-volatile memory to avoid infinite time reboot.

Except above I want more about this software watchdog or watchdog daemon to get status. I have implemented it to monitor services, CPU overload, temperature etc but I am not getting any event before watchdog action hence I am not getting why the system restarting due to a service down, CPU overheat or CPU overload etc.

1 Answers1

1

A watchdog is designed as a last resort to rescue a system after it has failed beyond recovery. A hardware watchdog will physically reset the CPU, and is used to make sure that a system doesn't hang for long periods.

There is no way to receive a warning that this will happen in software because it's assumed that all software has failed.

If you need a solution that detects that a process is no longer responding, you should make that separate from the watchdog.

See the answers to this question for something similar: Designing a monitor process for monitoring and restarting processes

Community
  • 1
  • 1
js441
  • 1,134
  • 8
  • 16