9

I've been working on a systemd service to wrap an administration script and I'm trying to gracefully handle it completely breaking.

Right now I have Restart set to always so it will try again when something fails, but some failure states require attention (missing config file, bad SQL, etc), so I don't want it continuously spinning in the background in an uncorrectable state.

I found StartLimitInterval, StartLimitBurst, and StartLimitAction, which stops trying to restart it after X failures in Y seconds, but it turns out that the only actions available for StartLimitAction are rebooting or shutting down the machine, which is a little overkill.

I've been looking at OnFailure and wrote a mini service to send an alert email when it's triggered, but OnFailure triggers every time the service dies, not when it hits the start limit, so we get a bunch of emails instead of just one.

Any ideas of what to try next?

Will
  • 93
  • 1
  • 5

2 Answers2

3

From the systemd.unit man page:

OnFailure=

A space-separated list of one or more units that are activated when this unit enters the "failed" state. A service unit using Restart= enters the failed state only after the start limits are reached.

However the second sentence appears to be a new constraint, as it is in the manual for version 241 of systemd on my Arch installations, but not in version 219 on my CentOS 7 installation.

You can check your systemd version with systemctl --version

I know it's an old question but just wanted to share for anyone else who has the same problem.

Haystack
  • 146
  • 4
2

Startlimitaction may be what you want. The man page says

... Takes one of none, reboot, reboot-force, reboot-immediate, poweroff, poweroff-force or poweroff-immediate. If none is set, hitting the rate limit will trigger no action besides that the start will not be permitted.

It seems that setting startlimit action to none may do what you want.

user9517
  • 115,471
  • 20
  • 215
  • 297
  • 1
    It's not quite what I'm looking for. What would be ideal is the ability to have `StartLimitAction` execute an arbitrary command instead of ignoring or rebooting. `OnFailure` triggers my alert script every time it fails, and I really only want the alert to be triggered when the service hits the start limit and will not be restarted. I'm just not sure it's possible without making some sort of weird wrapper with a counter. – Will Jun 27 '16 at 21:49