2

I have read a lot of articles about Spring Batch scaling in cloud platforms, I have also followed Michael Minella's video on high performance batch processing in youtube (https://www.youtube.com/watch?v=J6IPlfm7N6w).

My usecase is that I would be processing a large file of more than 1GB using Spring Batch in PCF. I understand that the files can be split and the class DeployerPartitionhandler can be used to start a new instance dynamically in PCF per partition/file, but the catch is we don't have Spring Cloud Dataflow and Spring Cloud services enabled in our PCF environment.

I saw that we can combine Spring Batch with Spring Integration and rabbitmq to do remote chunking of the large file using a master/worker configuration. But these workers need to manually started in PCF as a separate instance. Based on the load we have to manually start more worker instances.

But is there any other way provided by Spring Batch and PCF to autoscale the worker instances as per load? Or is there a way to dynamically start a new instance in PCF when the master is ready with the chunk while reading the file?

FYI : If I use the Autoscaler feature of PCF based on some metric such as CPU utilization, for every new instance it reads the whole file again processes it.

0 Answers0