0

When a container image is not present on the cluster the pod fails with the error ErrImageNeverPull but the job never fails. Is there a configuration that I can add to make sure the job fails if the pod startup fails.

apiVersion: batch/v1
kind: Job
metadata:
  name: image-not-present
spec:
  backoffLimit: 0
  ttlSecondsAfterFinished: 120
  template:
    spec:
      serviceAccountName: consolehub
      containers:
      - name: image-not-present
        image: aipaintr/image_not_present:latest
        imagePullPolicy: Never
      restartPolicy: OnFailure
General Grievance
  • 4,555
  • 31
  • 31
  • 45
Saurabh Saxena
  • 1,327
  • 2
  • 13
  • 26

1 Answers1

1

You can config activeDeadlineSeconds for this case. However, you have know how long your job take to reach Complete status to avoid this timeout can kill your pod processing.

From the documents:

The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.

For example: I have created job with wrong image and activeDeadlineSeconds: 100. Obviously, the pod stuck with status Pending because of wrong image.kubectl describe pod

After 100 seconds, the Job was Fail and the pod was killed as well. kubectl describe job

  • This will not work for jobs that take few mins to few hours. In my use-case a job can run for upto 4 hours(machine learning task). Is there a way to know if the job is stuck due to some failure. – Saurabh Saxena Jan 14 '23 at 15:16