I'm using Mark's solution with spec.jobTemplate.spec.activeDeadlineSeconds.
Just that there's one more thing into it. From the K8S docs:
Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
What actually happens when the pod is terminated is that K8S triggers a SIGTERM against the POD's container process pid 0. It's not waiting for the actual process to terminate. If your container does not gracefully terminate, it's going to stay into terminating state for 30 seconds, after which K8S triggers a SIGKILL. In the meantime, K8S potentially schedules another pod so the terminating one overlaps with the new scheduled one for at most 30 seconds.
This is easily reproducible with this CronJob definition:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cj-sleep
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 5
jobTemplate:
metadata:
creationTimestamp: null
spec:
activeDeadlineSeconds: 50
template:
metadata:
creationTimestamp: null
spec:
containers:
- command:
- "/usr/local/bin/bash"
- "-c"
- "--"
args:
- "tail -f /dev/null & wait $!"
image: bash
imagePullPolicy: IfNotPresent
name: cj-sleep
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* * * * *'
startingDeadlineSeconds: 100
successfulJobsHistoryLimit: 5
this is how the scheduling happens:
while true; do date; kubectl get pods -A | grep cj-sleep; sleep 1; done
Thu Sep 3 09:50:51 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Running 0 49s
Thu Sep 3 09:50:53 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 50s
Thu Sep 3 09:50:54 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 51s
Thu Sep 3 09:50:55 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 52s
Thu Sep 3 09:50:56 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 54s
Thu Sep 3 09:50:58 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 56s
Thu Sep 3 09:51:00 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 57s
Thu Sep 3 09:51:01 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 58s
Thu Sep 3 09:51:02 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 59s
Thu Sep 3 09:51:03 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 60s
default cj-sleep-1599126660-l69gd 0/1 ContainerCreating 0 0s
Thu Sep 3 09:51:04 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 61s
default cj-sleep-1599126660-l69gd 0/1 ContainerCreating 0 1s
Thu Sep 3 09:51:05 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 62s
default cj-sleep-1599126660-l69gd 1/1 Running 0 2s
....
Thu Sep 3 09:51:29 UTC 2020
default cj-sleep-1599126600-kzzxg 0/1 Terminating 0 86s
default cj-sleep-1599126660-l69gd 1/1 Running 0 26s
Thu Sep 3 09:51:30 UTC 2020
default cj-sleep-1599126660-l69gd 1/1 Running 0 28s
Thu Sep 3 09:51:32 UTC 2020
default cj-sleep-1599126660-l69gd 1/1 Running 0 29s
There is a detail specific to init 0 processes, they don't handle SIGTERM by default, you have to provide your own handler. In case of bash, it's by adding a trap:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cj-sleep
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 5
jobTemplate:
metadata:
creationTimestamp: null
spec:
activeDeadlineSeconds: 50
template:
metadata:
creationTimestamp: null
spec:
containers:
- command:
- "/usr/local/bin/bash"
- "-c"
- "--"
args:
- "trap 'exit' SIGTERM; tail -f /dev/null & wait $!"
image: bash
imagePullPolicy: IfNotPresent
name: cj-sleep
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* * * * *'
startingDeadlineSeconds: 100
successfulJobsHistoryLimit: 5
And now this is how the scheduling happens:
Thu Sep 3 09:47:54 UTC 2020
default cj-sleep-1599126420-sm887 1/1 Terminating 0 52s
Thu Sep 3 09:47:56 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 54s
Thu Sep 3 09:47:57 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 55s
Thu Sep 3 09:47:58 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 56s
Thu Sep 3 09:47:59 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 57s
Thu Sep 3 09:48:00 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 58s
Thu Sep 3 09:48:01 UTC 2020
Thu Sep 3 09:48:02 UTC 2020
default cj-sleep-1599126480-rlhlw 0/1 ContainerCreating 0 1s
Thu Sep 3 09:48:04 UTC 2020
default cj-sleep-1599126480-rlhlw 0/1 ContainerCreating 0 2s
Thu Sep 3 09:48:05 UTC 2020
default cj-sleep-1599126480-rlhlw 0/1 ContainerCreating 0 3s
Thu Sep 3 09:48:06 UTC 2020
default cj-sleep-1599126480-rlhlw 1/1 Running 0 4s