I have an IndexedJob. For certain runs Kubernetes (1.24.10 on AKS) is creating jobs for certain indices multiple times before marking the job to be complete. I am at a loss to figure out what might be causing this behaviour
apiVersion: batch/v1
kind: Job
metadata:
name: lombard-ttc-lgd-20230228
spec:
completions: 6
parallelism: 6
completionMode: Indexed
backoffLimit: 0
template:
spec:
restartPolicy: Never
This describe job output shows multiple job creations for the same index, all the previous executions have been successful. Note the zero failed Pod statuses. Eventually the job is marked as complete. Not all indices are retried the same amount of time.
$kubectl describe job lombard-pit-pd-20230228
Name: lombard-pit-pd-20230228
Namespace: default
Selector: controller-uid=111df9b6-d1ad-44af-9e9f-a3aa548bc6f4
Labels: controller-uid=111df9b6-d1ad-44af-9e9f-a3aa548bc6f4
job-name=lombard-pit-pd-20230228
Annotations: Parallelism: 6
Completions: 6
Start Time: Fri, 28 Apr 2023 16:06:05 +0100
Pods Statuses: 4 Running / 2 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=111df9b6-d1ad-44af-9e9f-a3aa548bc6f4
job-name=lombard-pit-pd-20230228
Containers:
ubuntu:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 54m job-controller Created pod: lombard-pit-pd-20230228-0-5tk9f
Normal SuccessfulCreate 54m job-controller Created pod: lombard-pit-pd-20230228-2-xlxxb
Normal SuccessfulCreate 54m job-controller Created pod: lombard-pit-pd-20230228-1-8mlzj
Normal SuccessfulCreate 54m job-controller Created pod: lombard-pit-pd-20230228-5-l8wxx
Normal SuccessfulCreate 54m job-controller Created pod: lombard-pit-pd-20230228-3-fcnd6
Normal SuccessfulCreate 54m job-controller Created pod: lombard-pit-pd-20230228-4-9t5vt
Normal SuccessfulCreate 10m job-controller Created pod: lombard-pit-pd-20230228-2-rn2cn
Normal SuccessfulCreate 2m8s job-controller Created pod: lombard-pit-pd-20230228-5-dctlq
Normal SuccessfulCreate 42s job-controller Created pod: lombard-pit-pd-20230228-1-clgcs
Normal SuccessfulCreate 42s job-controller Created pod: lombard-pit-pd-20230228-3-x4hpg
kubectl events, pod logs show no error messages