How to Properly Update the Status of a Job

Question

As far as I know, when most people want to know if a Kubernetes (or Spark even) Job is done, they initiate some sort of loop somewhere to periodically check if the Job is finished with the respective API.

Right now, I'm doing that with Kubernetes in the background with the disown (&) operator (bash inside Python below):

import subprocess

cmd = f'''
kubectl wait \\
    --for=condition=complete \\
    --timeout=-1s \\
    job/job_name \\
    > logs/kube_wait_log.txt \\
    &
'''

kube_listen = subprocess.run(
    cmd,
    shell = True,
    stdout = subprocess.PIPE
)

So... I actually have two (correlated) questions:

Is there a better way of doing this in the background with shell other than with the & operator?
The option that I think would be best is actually to use cURL from inside the Job to update my Local Server API that interacts with Kubernetes.
- However, I don't know how I can perform a cURL from a Job. Is it possible?
- I imagine you would have to expose ports somewhere but where? And is it really supported? Could you create a Kubernetes Service to manage the ports and connections?

score 4 · Accepted Answer · answered Aug 19 '19 at 23:58

If you don't want to block on a process running to completion, you can create a subprocess.Popen instance instead. Once you have this, you can poll() it to see if it's completed. (You should try really really really hard to avoid using shell=True if at all possible.) So one variation of this could look like (untested):

with open('logs/kube_wait_log.txt', 'w') as f:
  with subprocess.Popen(['kubectl', 'wait',
                         '--for=condition=complete',
                         '--timeout=-1s',
                         'job/job_name'],
                         stdin=subprocess.DEVNULL,
                         stdout=f,
                         stderr=subprocess.STDOUT) as p:
    while True:
      if p.poll():
        job_is_complete()
        break
      time.sleep(1)

Better than shelling out to kubectl, though, is using the official Kubernetes Python client library. Rather than using this "wait" operation, you would watch the job object in question and see if its status changes to "completed". This could look roughly like (untested):

from kubernetes import client, watch
jobsv1 = client.BatchV1Api()
w = watch.watch()
for event in w.stream(jobsv1.read_namespaced_job, 'job_name', 'default'):
  job = event['object']
  if job.status.completion_time is not None:
    job_is_complete()
    break

The Job's Pod doesn't need to update its own status with the Kubernetes server. It just needs to exit with a successful status code (0) when it's done, and that will get reflected in the Job's status field.

Really nice answer. Thank you. Would you confirm that listening until the `Job` is complete is the best practice? Really nice `Python` option you mentioned, but I might still have to use shelling anyway because I think it would make the code more manageable by other people because not everyone here uses `Python`. By the way, do you have `Kubernetes` *resources* you would recommend to newcomers? — Philippe Fanaro, Aug 20 '19 at 11:42
Kubernetes’ native API is in Go, if that’s better. I’d start by going to https://kubernetes.io/ and clicking the “Documentation” link at the top; do as much as you can by writing out YAML files and using `kubectl apply`, try to avoid imperative commands like `kubectl create`. Transitioning from there to using the API is fairly straightforward. — David Maze, Aug 20 '19 at 15:01

How to Properly Update the Status of a Job

1 Answers1