9

According to the docs -

Failed containers that are restarted by Kubelet, are restarted with an exponential back-off delay, the delay is in multiples of sync-frequency 0, 1x, 2x, 4x, 8x … capped at 5 minutes and is reset after 10 minutes of successful execution.

Is there any way to define a custom RestartPolicy? I want to minimize the back-off delay as much as possible and drop off the exponential behavior.

As far as I can find, you can't even configure the RestartPoilcy, let alone make a new one...

Andrei Sinitson
  • 739
  • 6
  • 10
user1708860
  • 1,683
  • 13
  • 32

2 Answers2

2

The backoff delay is not tunable because it could severely affects the reliability of kubelet. Imagine you have some pods that keep crashing on the node, kubelet will continuously restarting all those pods/containers with no break, consuming a lot of resources.

Why do you want to change the restart backoff delay?

Yu-Ju Hong
  • 6,511
  • 1
  • 19
  • 24
  • 1
    Because the behavior you just described is just what I want to achieve. I'm in an environment where I don't have to worry about resources and thus I could sometimes allow myself of faster restart times at the cost of what ever resources I need. I have some modules that I want to configure for the fastest restart times possible (even if they would fail again in a loop) as their role in the application is critical. – user1708860 Dec 12 '16 at 22:43
  • 1
    If the app container keeps crashing, there are other issues (e.g., wrong configuration) and restarts don't really help. If your container crash only once in a while, the backoff wouldn't been reset already and kubelet should restart it immediately. Why would you expect that your app container will fail frequently? Also, even if you don't care about resources, this could potentially overwhelm the container runtime (e.g., docker), and cause the entire node to be less reliable. The fact that you have resources doesn't mean all the daemons can utilize them effectively. – Yu-Ju Hong Dec 12 '16 at 23:11
  • We are moving a legacy system into a private cloud, the scenario of an app crashing in a loop because of some trashed data in the DB is not uncommon. But that trashed data is removed after some time and the app can restart safely. If I restart the same container I don't expect having an exponential amount of resources used by the system.. I'd expect the memory to stay pretty much constant with maybe some spikes and the CPU to run wild, but that's a hit I can take. The apps are critical and we want to achieve the fastest restart times possible. The exponential restart time is what frightens us. – user1708860 Dec 12 '16 at 23:31
  • 2
    It could make sense to be able to tune the policy. Imagine this situation: you have an application that that consumes data from a queue, and does something with it. The queue being the sole input, it makes sense to crash if the queue becomes unavailable (network outage). However, if you keep trying to re-create the container every 5 mins, you are not wasting much resources, but you would have your application back online automatically as soon as the queue becomes accessible. As it is, you have to restart manually or manage the reconnect in your application. – Emil D Jan 10 '18 at 21:19
  • 1
    There is a GitHub issue for this, make sure to voice yourself as I thought it'd be natural for k8s to have this option configurable because every application has different purposes. https://github.com/kubernetes/kubernetes/issues/57291#issuecomment-727653127 – ScalaWilliam Nov 15 '20 at 23:05
1

About customizing your RestartPolicy, according to Kubernetes documentation:

Only a .spec.template.spec.restartPolicy equal to Always is allowed, which is the default if not specified.

you can see the detailed answer of @Rohit here.

Livne Rosenblum
  • 196
  • 1
  • 12