8

I have a program which is run by systemd with a service file like this:

[Unit]
Description=...

[Service]
Type=notify
ExecStart=/usr/sbin/myprogram
WatchdogSec=1
KillMode=process
KillSignal=SIGTERM
Restart=always

It sends the respective signal to the watchdog regularly. From time to time, the program seems to hang and is terminated by the watchdog, then restarts. Before the watchdog terminates it, I'd like to capture some information from the program by executing a command or running some other script (e.g. run gdb -p <PID> --batch -ex 'thread apply all backtrace'). How would I do this?

jotrocken
  • 2,263
  • 3
  • 27
  • 38

1 Answers1

4

Add a ExecStop= to your service.

[Service]
ExecStart=....
ExecStop=/path/to/SomeOtherProgram
....

According to systemd manual, if ExecStop option is available, it will run that first, then if the process under ExecStart is still available after this, it will run the KillMode.

ExecStop= Commands to execute to stop the service started via ExecStart=. This argument takes multiple command lines, following the same scheme as described for ExecStart= above. Use of this setting is optional. After the commands configured in this option are run, it is implied that the service is stopped, and any processes remaining for it are terminated according to the KillMode= setting (see systemd.kill(5)). If this option is not specified, the process is terminated by sending the signal specified in KillSignal= when service stop is requested. Specifier and environment variable substitution is supported (including $MAINPID, see above).

EDIT

As in the comment below, this solution may not work for Watchdog option in the service file.

iamauser
  • 11,119
  • 5
  • 34
  • 52
  • This should help, but there are two things which I dislike about this approach: 1) It will execute the `ExecStop=` command also when the service is intentionally stopped, e.g. during deployment. 2) If I understand correctly, this overrides the default stop command. So the service will not be stopped, unless either setting `TimeoutStopSec=` which triggers a SIGKILL or adding another `ExecStop=` with the original stop command (which I don't know) or something else that creates the condition described in your quote. Nevertheless, I will give it a try. Do you see any other solution? – jotrocken Oct 18 '18 at 09:25
  • No, you don't need any extra attributes. `ExecStop=` stops the other script and after it does, if `systemd` still finds the `process-id` that started from `ExecStart=`, it will invoke the `KillMode=process` with a `KillSignal=SIGTERM`. Without the `ExecStop=`, it would have done the same anyways. – iamauser Oct 18 '18 at 13:46
  • Unfortunately, it turned out that this does not work as we expected. systemd runs the `ExecStop=` command _after_ the watchdog terminated the service (first sending `SIGABRT` and then `SIGTERM` when timing out after 90 seconds). I guess I cannot use any systemd mechanics here but have to handle the SIGABRT internally and collect the information from within the service. – jotrocken Oct 19 '18 at 08:58
  • okay, tbh, I did test it but without the `Watchdog` and it does kill the process...It's possible `Watchdog` is overriding the structure. – iamauser Oct 19 '18 at 14:34