I have a Spring Batch that partitions into "Slave Steps" and run in a thread pool, here is the configuration: Spring Batch - FlatFileItemWriter Error 14416: Stream is already closed
I'd like to run this Spring Batch Job in Kubernetes. I checked this post: https://spring.io/blog/2021/01/27/spring-batch-on-kubernetes-efficient-batch-processing-at-scale by @MAHMOUD BEN HASSINE.
From the post, on Paragraph:
- Choosing the Right Kubernetes Job Concurrency Policy As I pointed out earlier, Spring Batch prevents concurrent job executions of the same job instance. So, if you follow the “Kubernetes job per Spring Batch job instance” deployment pattern, setting the job’s spec.parallelism to a value higher than 1 does not make sense, as this starts two pods in parallel and one of them will certainly fail with a JobExecutionAlreadyRunningException. However, setting a spec.parallelism to a value higher than 1 makes perfect sense for a partitioned job. In this case, partitions can be executed in parallel pods. Correctly choosing the concurrency policy is tightly related to which job pattern is chosen (As explained in point 3).
Looking into my Batch Job, if I start 2 or more pods, it sounds like one/more pods will fail because it will try to start the same job. But on the other hand, it sounds like more pods will run in parallel because I am using partitioned job.
My Spring Batch seems to be a similar to https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/
This said, what is the right approach to it? How many pods should I set on my deployment? Do the partition/threads will run on separate/different pods, or the threads will run in just one pod? Where do I define that, in the parallelism? And the parallelism, should it be the same as the number of threads?
Thank you! Markus.