54

I need to have network messages sent when a systemd service I have crashes or is hung (i.e., enters failed state; I monitor for hung by using WatchdogSec=). I noticed that newer systemd have FailureAction=, but then saw that this doesn't allow arbitrary commands, but just rebooting/shutdown.

Specifically, I need a way to have one network message sent when systemd detects the program has crashed, and another when it detects it has hung.

I'm hoping for a better answer than "parse the logs", and I need something that has a near-instant response time, so I don't think a polling approach is good; it should be something triggered by the event occurring.

Display Name
  • 761
  • 1
  • 8
  • 13

3 Answers3

51

systemd units support OnFailure that will activate a unit (or more) when the unit goes to failed. You can put something like

 OnFailure=notify-failed@%n

And then create the notify-failed@.service service where you can use the required specifier (you probably will want at least %i) to launch the script or command that will send notification.

You can see a practical example in http://northernlightlabs.se/systemd.status.mail.on.unit.failure

Davy Landman
  • 173
  • 1
  • 5
Pablo Martinez
  • 2,406
  • 17
  • 13
  • 5
    There are a couple corrections needed to the instructions on the linked site. First, `notify%n.service` is redundant, and will result in `notify@my-service.service.service`. Second, `%i` should be used instead of `%I`, or all dashes in the name will be converted to forward slashes. – orodbhen Jun 22 '16 at 15:42
  • 7
    Is there a way to do this for multiple or all units, without modifying their unit files? – Vladimir Panteleev Sep 10 '17 at 12:52
  • @VladimirPanteleev - you don't need to modify the actual unit files - you can just add an override for that specific feature. For example, run `systemctl edit my-service.service` and in the editor that opens add a line `[Unit]` followed by `OnFailure=notify-failed@%n`, save and exit. This will create an override file in `/etc/systemd/system/my-service.service.d/override.conf` with the added functionality (of course you can automate the creation of such files for multiple services, just don't forget to do `systemctl daemon-reload` if you modified files not through `systemctl`). – Guss Feb 06 '22 at 11:41
  • For anybody looking to do this for all service files at once, check **Example 3** at the very end of [systemd.unit](https://www.freedesktop.org/software/systemd/man/systemd.unit.html). You need to place a configuration under `service.d` directory and it will apply to all services. – Felipe May 19 '22 at 17:59
  • @Felipe - I tried that on an Ubuntu 18.04 system but can't get it to work as advertised. The `OnFailure= failure-handler@%n.service ` does work when attached to the individual service's `[Unit]` section but not when `/etc/systemd/system/service.d/10-all.conf` is the only place it is defined. – cueedee Feb 07 '23 at 08:03
  • @Felipe - ...adding to my own comment, it seems that top-level drop-ins need `systemd` version [244](https://github.com/systemd/systemd/blob/v244/man/systemd.unit.xml#L195) or newer and Ubuntu 18 only has version 237. – cueedee Feb 09 '23 at 21:09
33

Just my way to notify :

/etc/systemd/system/notify-email@.service

[Unit]
Description=Sent email 

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c '/usr/bin/systemctl status %i | /usr/bin/mailx -Ssendwait -s "[SYSTEMD_%i] Fail" your_admin@company.blablabla'

[Install]
WantedBy=multi-user.target

add to systemd:

systemctl enable /etc/systemd/system/notify-email@.service

At others services add:

[Unit]
OnFailure=notify-email@%i.service

Reload the configuration:

systemctl daemon-reload
tjmcewan
  • 493
  • 3
  • 5
ceinmart
  • 497
  • 4
  • 11
  • 1
    Is there a way to avoid triggering it lots of times in a row? In some situations receiving 1K emails about a service that failed at night and tried over and over again to restart itself isn't helpful. – starbeamrainbowlabs Sep 20 '19 at 19:27
  • 1
    As far I know, no, there is no option from systemd. You should put some control into the bash command, something like touching a file and checking if it have +10min for example... in simple command logic: find -mmin +10 && send email && touch file ; – ceinmart Apr 07 '20 at 14:30
  • 2
    Why are you enabling the notification service? It's supposed to be started by other units, no reason to start it on boot. – drrlvn Mar 18 '22 at 08:30
  • `/bin/bash` instead of `/usr/bin/bash` – JulianW Oct 18 '22 at 12:32
  • 1
    I'm a newbie here, but what I read at https://www.freedesktop.org/software/systemd/man/systemd.unit.html (Example 3. Top level drop-ins with template units) and https://unix.stackexchange.com/a/506374/16256 makes me wonder if the `WantedBy=multi-user.target` line is unnecessary or unwanted. Would it cause this to send a notification at each boot? – nealmcb May 30 '23 at 04:15
0

I came across this utility which seems to provide this: https://github.com/joonty/systemd_mon