I have to process tasks stored in a work queue and I am launching this kind of Job to do it:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
parallelism: 10
containers:
- name: pi
image: perl
command: ["some", "long", "command"]
restartPolicy: Never
backoffLimit: 0
The problem is that if one of the Pod managed by the Job fails, the Job will terminate all the other Pods before they can complete. On my side, I would like the Job to be marked as failed but I do not want its Pods to be terminated. I would like them to continue running and finish processing the items they have picked in the queue.
Is there a way to do that please?