I have started researching Spring Batch in the last hour or two. And require your inputs.
The problem : Read a/multiple csv file(s) with 20 million data, perform minor processing, store it in db and also write output to another flat file in the least time.
Most important : I need to make choices which will scale horizontally in the future.
Questions :
Use Remote Chunking or Partitioning to scale horizontally?
Since data is in a flat file both Remote Chunking and Partitioning are bad choices?
Which multi process solution will make it possible to read from a large file, spread processing across multiple servers and update Db but finally write/output to a single file?
Does multiresourcepartitioner work across servers?
Any good tutorials you know of where something like this has been accomplished/demonstrated?
Your thoughts on how this needs to be attempted like 1) Split large file into smaller files before starting the job 2) Read one file at a time using the Item Reader...........