1

I have a Python process that I want to fire up every n minutes in a Kubernetes cronjob and read a number of messages (say 5) from a queue, and then process/convert some files and run analysis on results based on these queue messages. If the process is still running after n minutes, I don't want to start a new process. In total, I would like a number of these (say 3) of these to be able to run at the same time, however, there can never be more than 3 processes running at the same time. To try and implement this, I tried the following (simplified):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: some-job
  namespace: some-namespace
spec:
  schedule: "*/5 * * * *"
  concurrencyPolicy: "Forbid"
  jobTemplate:
    spec:
      parallelism: 3
      template:
        spec:
          containers:
          - name: job
            image: myimage:tag
            imagePullPolicy: Always
            command: ['python', 'src/run_job.py']

Now what this amounts to is a maximum of three processes running at the same time due to 'parallelism' being 3, and concurrencyPolicy being "Forbid", even if the processes go over the 5 minute mark.

The problem I specifically have is that one pod (e.g. pod 1) can take longer than the other two to finish, which means that pod 2 and 3 might finish after a minute, while pod one only finishes after 10 minutes due to processing of larger files from the queue.

Where I thought that parallelism: 3 would cause pod 2 and 3 to be deleted and replaced after finishing (when new cron interval hits), they are not and have to wait for pod 1 to finish before starting three new pods when the cron interval hits again.

When I think about it, this functionality makes sense given the specification and meaning of what a cronjob is. However, I would like to know if it would be able to have these pods/processes not be dependent on one another for restart without having to define a duplicate cronjob, all running one process.

Otherwise, maybe I would like to know if it's possible to easily launch more duplicate cronjobs without copying them into multiple manifests.

Tim
  • 147
  • 8
  • 1
    Duplicate cronjobs seems to be the way to achieve what you are looking for. Produce 3 duplicates with single job at a time. You could template the job manifest and produce multiple as in the following example. The example is not in your problem context, but you can get the idea.https://kubernetes.io/docs/tasks/job/parallel-processing-expansion/ – gordanvij Jul 27 '22 at 09:45
  • 1
    @gordanvij This seems to be exactly what I'm looking for. Thanks for the info – Tim Jul 27 '22 at 14:00
  • @ gordanvij, please post the provided comment as a solution for the greater visibility of the community. – Srividya Jul 28 '22 at 06:20
  • Thank you @Srividya for reminding. Tim, please mark as answered if you are good with it. – gordanvij Jul 31 '22 at 11:10

1 Answers1

1

Duplicate cronjobs seems to be the way to achieve what you are looking for. Produce 3 duplicates with single job at a time. You could template the job manifest and produce multiple as in the following example. The example is not in your problem context, but you can get the idea. http://kubernetes.io/docs/tasks/job/parallel-processing-expansion

gordanvij
  • 1,080
  • 8
  • 16