56

I want my systemd service to be automatically restarted on failure. Additionally I want to rate limit the restarts. I want to allow maximum of 3 restarts within 90 seconds duration. Hence I have done the following configuration.

[Service]  
Restart=always  
StartLimitInterval=90  
StartLimitBurst=3

Now the service is restarted on failure. After 3 Quick failures/restarts it is not restarting anymore as expected. Now I expected the systemd to start the service after the timeout (StartLimitInterval). But the systemd is not automatically starting the service after the timeout(90sec), if I manually restart the service after the timeout it is working. But I want the systemd to automatically start the service after the StartLimitInterval. Please let me know on how to achieve this feature.

pevik
  • 288
  • 1
  • 12
Dinesh P.R.
  • 671
  • 1
  • 6
  • 7
  • 4
    I wrote an article that explains how to create a service, and how to avoid this particular issue: [Creating a Linux service with systemd](https://medium.com/@benmorel/creating-a-linux-service-with-systemd-611b5c8b91d6). – BenMorel Sep 06 '17 at 12:05
  • 2
    I think you're looking for `StartLimitIntervalSec`, not `StartLimitInterval`. – Marc Tamsky Oct 24 '17 at 17:10
  • 2
    @MarcTamsky this is the same things. `StartLimitIntervalSec` was added in systemd v230 and should replace `StartLimitInterval`. – Roman Sklyarov Oct 16 '20 at 19:33
  • Actually even `StartLimitIntervalSec` was renamed to [`DefaultStartLimitIntervalUSec`](https://github.com/systemd/systemd/blob/master/src/core/dbus-manager.c#L2563) in [v237](https://github.com/systemd/systemd/commit/c075f5fcf8d3ddb40f4f49c91fa699c2774f6259). But for you also even earlier change: move old [`StartLimitInterval` from `[Service]` to `[Unit]` section](https://github.com/systemd/systemd/commit/6bf0f408e4833152197fb38fb10a9989c89f3a59) in `v229` as @Ingo described. – pevik Dec 04 '20 at 10:28
  • I'm probably wrong in previous comment: although there is `StartLimitIntervalUSec` Unit directive in [`man systemd.directives(7)`](https://www.freedesktop.org/software/systemd/man/systemd.directives.html), [`man systemd.unit(5)`](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=) mentions only `StartLimitIntervalSec`. – pevik Dec 04 '20 at 11:09

5 Answers5

60

To have a service restart 3 times at 90 second intervals include the following lines in your systemd service file:

[Unit]
StartLimitIntervalSec=400
StartLimitBurst=3
[Service]
Restart=always
RestartSec=90

Before systemd-230 it was called just StartLimitInterval:

[Unit]
StartLimitInterval=400
StartLimitBurst=3
[Service]
Restart=always
RestartSec=90

This worked worked for me for a service that runs a script using Type=idle. Note that StartLimitIntervalSec must be greater than RestartSec * StartLimitBurst otherwise the service will be restarted indefinitely.

It took me some time with a lot of trial and error to work out how systemd uses these options, which suggests that systemd isn't as well documented as one would hope. These options effectively provide the retry cycle time and maximum retries that I was looking for.

References: https://manpages.debian.org/testing/systemd/systemd.unit.5.en.html for Unit section https://manpages.debian.org/testing/systemd/systemd.exec.5.en.html for Service section

Hannes
  • 307
  • 2
  • 12
jross
  • 719
  • 5
  • 2
23

Some years later and with systemd 232 it dosn't work anymore as described in the question and in the answers from 2016. Option name StartLimitIntervalSec and Sections have changed. Now it has to look like this example:

[Unit]
StartLimitBurst=5
StartLimitIntervalSec=33

[Service]
Restart=always
RestartSec=5
ExecStart=/bin/sleep 6

This will do 5 restarts in 30 sec (5*6) plus one restart in 33 sec. So we have 6 restarts in 33 sec. This exceeds the limit of 5 restarts in 33 sec. So restarts will stop at 5 counts after about 31 sec.

Ingo
  • 416
  • 5
  • 13
  • 2
    It looks like `StartLimitInterval` is still supported, if undocumented, in the `Service` section. But the new, preferred `StartLimitIntervalSec` only works in `Unit`. – Danek Duvall Jun 21 '19 at 21:12
  • 2
    This example seems to not to work correctly. Because the service "runs" for 6 second and then waits 5 seconds to restart, the whole start->start loop takes 11 seconds. So there are 3 (or 4 at the edge) starts in 33 seconds and the StartLimitBurst of 5 is never reached. I've just tried and it restarts indefinitely. – Tomas Novotny Mar 23 '21 at 10:51
12

The behavior you describe is consistent with the documentation:

StartLimitInterval=, StartLimitBurst= Configure service start rate limiting. By default, services which are started more than 5 times within 10 seconds are not permitted to start any more times until the 10 second interval ends. With these two options, this rate limiting may be modified. Use StartLimitInterval= to configure the checking interval (defaults to DefaultStartLimitInterval= in manager configuration file, set to 0 to disable any kind of rate limiting). Use StartLimitBurst= to configure how many starts per interval are allowed (defaults to DefaultStartLimitBurst= in manager configuration file). These configuration options are particularly useful in conjunction with Restart=; however, they apply to all kinds of starts (including manual), not just those triggered by the Restart= logic. Note that units which are configured for Restart= and which reach the start limit are not attempted to be restarted anymore; however, they may still be restarted manually at a later point, from which point on, the restart logic is again activated. Note that systemctl reset-failed will cause the restart rate counter for a service to be flushed, which is useful if the administrator wants to manually start a service and the start limit interferes with that.

I am still trying myself to figure out a way to accomplish the behavior you desire.

guettli
  • 3,591
  • 17
  • 72
  • 123
Youssef Eldakar
  • 141
  • 1
  • 5
  • This is more a comment than an answer as you point out. – Dave M Feb 02 '16 at 13:29
  • exactly what i needed, ty – Some Linux Nerd Nov 02 '17 at 18:35
  • According to the documentation you've linked, shouldn't it be `StartLimitIntervalSec=` (and `DefaultStartLimitIntervalSec=`)? Note the addition of `Sec` to both parameter names. – Doktor J Jan 15 '19 at 21:06
  • It's a version thing, the online docs are current, but released systems (like CentOS 7) are running 219, which doesn't support the Sec ending and accepts StartLimitInterval in [Service]. – stolenmoment Jun 18 '21 at 11:53
2

You can use StartLimitAction=reboot. This will reboot the system after the StartLimitInterval.

StartLimitAction= Configure the action to take if the rate limit configured with StartLimitInterval= and StartLimitBurst= is hit. Takes one of none, reboot, reboot-force, or reboot-immediate. If none is set, hitting the rate limit will trigger no action besides that the start will not be permitted. reboot causes a reboot following the normal shutdown procedure (i.e. equivalent to systemctl reboot). reboot-force causes a forced reboot which will terminate all processes forcibly but should cause no dirty file systems on reboot (i.e. equivalent to systemctl reboot -f) and reboot-immediate causes immediate execution of the reboot(2) system call, which might result in data loss. Defaults to none.

mcv
  • 21
  • 1
1

You can set OnFailure to start another service when this fails. In the on-fail service you can run a script that waits and then restarts your service.

For a sample on how to set this up see Systemd status mail on unit failure and modify it to restart the service instead.

laktak
  • 686
  • 2
  • 9
  • 16