0

I am working with:

kubernetes 1.3.6

.. with this part in the deployment file of my application:

    livenessProbe:
      httpGet:
        path: /liveness
        port: 8082
      initialDelaySeconds: 120

.. so that when I describe the pod I got this

Liveness: http-get http://:8082/liveness delay=120s timeout=1s period=10s #success=1 #failure=3

My application often starts in 110-115 seconds, but sometimes it takes more (due to DB delays, external services retry, etc ..).

The problem I see is that when it takes more than 130/140 seconds (initialDelaySeconds + period), kubernetes forces the shutdown and the pod re-start from scratch. When you have a lot of replicas (50-60) it means that the full deployment sometimes takes 10-15 minutes more than the normal one. Obviously a solution is to increase the initialDelaySeconds, but then all the deployments will take a lot more time.

I had a look here and there's nothing that seems to solve this problem: http://kubernetes.io/docs/api-reference/v1/definitions/#_v1_probe

Ideally I would like to have something that works in the opposite way: not an "initialDelaySeconds", but a maximum amount of time to start the pod. If that time passes, kubernetes forces the pod shutdown and tries another time.

Michele Orsi
  • 754
  • 4
  • 17

2 Answers2

4

I finally ended up with a good solution, that at the moment works perfectly!

I set:

  • readinessProbe.initialDelaySeconds: equals to the minimum startup time of the application
  • livenessProbe.initialDelaySeconds: equals to the maximum startup time of the application + couple of seconds

So that kubernetes (after readinessProbe.initialDelaySeconds) starts to check readiness probe in order to add the pod to the balancing. Then (after livenessProbe.initialDelaySeconds) it starts to check also the liveness probe, in case the pod needs restarting.

Michele Orsi
  • 754
  • 4
  • 17
0

Well, it seems like the time you are talking about is actually there, just not explicitly.

The formula for the time you are looking for would be

initialDelaySeconds + period * (failureTreshold - 1)

(-1 because the probe is executed right after initialDelaySeconds). You can tune maximumAmountOfTime (the parameter that you want to have) by changing these 3 values.

EDIT: after comment from OP, above answer is wrong, it seems like increasing initialDelaySeconds is the only thing you can do for now.

Nebril
  • 3,153
  • 1
  • 33
  • 50
  • Well the problem is that on the description it says "Minimum consecutive failures for the probe to be considered failed AFTER HAVING SUCCEEDED. Defaults to 3. Minimum value is 1." So it should succeed once before failureTreshold makes its own job! – Michele Orsi Sep 06 '16 at 08:25
  • You are right, I missed that. I think that your only option is to increase initialDelaySeconds for now. And you can submit a feature request in https://github.com/kubernetes/kubernetes/issues explaining your use case. – Nebril Sep 06 '16 at 08:35
  • OR (it is hacky, but may work) you can create a liveness probe that will succeed at first run (maybe use conditional in bash with setting the env variable?), but will run your normal liveness probe in subsequent runs. This way, you will have "succeeded" pod and you will be able to use the period and failureTreshold. – Nebril Sep 06 '16 at 08:40