3

I have a pod running on Kubernetes for which I am designing a liveness probe. My application reads from a queue (via a loop which continually searches for new messages and executes other functions if it finds one) and is not exposed via HTTP, so I need a command liveness probe. I am pondering whether a simple implementation would work:

livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy

However, I'm unsure whether the cat would succeed even if the application was 'stuck' at some point in the loop - the file would still be there. This comes down to a fundamental lack of understanding of liveness probes which I was unable to find in the documentation - presumably they run somehow in series with your application so if your app is not running, the command cannot be executed? But I am not confident on this point.

If the command can be executed in parallel then I believe I will need some kind of timestamp check where I update a file on each loop and the liveness probe checks its timestamp. If the first way works it is simpler, but can anyone confirm if this is the case? Thanks.

Edit: my app code. I added in the sleep(60)s to try and test whether the liveness probe would fail if the file hadn't been updated in a minute, but they wouldn't be part of the normal app code.

INITIALISATION CODE

with open('loaded.txt','w') as f:          # readiness probe = check this file exists
    f.write('loaded')

current_backoff = 0
    max_backoff = 10
    while True:
        if current_backoff < max_backoff:
            current_backoff +=1
        with open('loaded.txt','w') as f:
            f.write('loaded')
            sleep(60)

        messages = input_queue_client.receive_messages(visibility_timeout=100)
        for message in messages:
            with open('loaded.txt','w') as f:
                f.write('loaded')
            sleep(60)
            current_backoff = 0
 
            CODE TO PROCESS MESSAGES

        sleep(current_backoff)

My liveness probe attempts:

1.

        livenessProbe:
          exec:
            command:
              - find
              - /var/app/loaded.txt
              - -mmin 
              - '+0.1'
          initialDelaySeconds: 10
          periodSeconds: 10
  1. (command returns failure if anything is returned from find, otherwise cat the file)
        livenessProbe:
          exec:
            command:
              - find
              - /var/app/loaded.txt
              - -mmin 
              - '+0.1'
              - -exec
              - cat
              - '/var/app/loaded.txt{}'
              -  ;
          initialDelaySeconds: 10
          periodSeconds: 10
  1. (command returns failure if anything is returned from find, otherwise return nothing)
        livenessProbe:
          exec:
            command:
              - find
              - /var/app/loaded.txt
              - -mmin 
              - '+0.1'
              - -exec
              - if[[{}]]
              - ;
          initialDelaySeconds: 10
          periodSeconds: 10

I have also tried all of these with - instead of +. The probe never fails despite the very short window (which will eventuallly be longer!) and the sleep command.

Lucy
  • 179
  • 1
  • 4
  • 14

1 Answers1

2

Liveness probing done by kubelet in each node. And yes, it runs in parallel with your application.

In you case, you could touch /tmp/healthy file each time you start new iteration in loop. And use command like find /tmp/health -mmin +0.5 in health check. This command returns nothing if file is older than half a minute. If health check command returns nothing it's assumed that check is passing.

Grigoriy Mikhalkin
  • 5,035
  • 1
  • 18
  • 36
  • Thanks for explaining, that's helpful. I've tried to use your suggested solution but I think the health check passes whether or not the file has been modified in the past 0.5 mins. If the command returns nothing, it is assumed that the health check is passing. If the command returns a file, it seems to also assume that it is passing. How can I get it to return an error status code if the file is too old? – Lucy Nov 11 '20 at 16:20
  • I have used: ` livenessProbe: exec: command: - find - /myfile.txt - -mmin - '-0.5' ` Have also tried with +0.5! – Lucy Nov 11 '20 at 16:29
  • @Lucy That's strange, health check shouldn't pass if command returns something. Did you tried to create file just once and after that not touching it? – Grigoriy Mikhalkin Nov 11 '20 at 17:31
  • @Lucy would it be possible for you to show some minimal example of your app, so i could try to reproduce your problem locally? – Grigoriy Mikhalkin Nov 11 '20 at 17:32
  • "health check shouldn't pass if command returns something" - I thought a common health probe command was `cat file.txt`, which would return something (the file contents)? – Lucy Nov 12 '20 at 10:47
  • I have edited the main post with my app structure @Grigoriy Mikhalkin – Lucy Nov 12 '20 at 15:27
  • It seems that find always returns a status code of 0 (success) if it finds the file, whether or not it meets the -mmin criteria. This causes the health probe to pass whether or not the -mmin returns a file. I guess I need to pipe the output to a command whose success code depends on the input being null or non-null, but I'm finding it difficult to find a suitable method. – Lucy Nov 12 '20 at 17:31
  • @Lucy I would suggest you to ask new question like "How to set error code in liveness probe command" or something. – Grigoriy Mikhalkin Nov 12 '20 at 18:29
  • 1
    Thanks, I solved the problem this morning by building a separate bash script with specified conditional exit codes, then running this as the liveness probe command. Thanks for your help! – Lucy Nov 13 '20 at 15:33