0

Presently i am using springbatch to process csv and excel files in below manner.

  1. Reader(will parse csv/excel files and male pojo)
  2. Processor (will hit Db whether this record is there in DB or not )
  3. Writer(will push the pojo to message queue)

In real time i have 50k + records to process for which my code almost taking 25 minutes. I want to improve processing time by implementing parallel processing(so that in parallel we can process the same in less time).

But i have no clue how to achieve parallel processing with Spring Batch. Can any one guide me how to do it or any suggestions to improve processing time.

@Bean
    public TaskExecutor taskExecutor(){
        return new SimpleAsyncTaskExecutor("CSV-Async-batch");
    }


    @Bean(name="csvjob")
    public Job job(JobBuilderFactory jobBuilderFactory,StepBuilderFactory stepBuilderFactory,ItemReader<List<CSVPojo>> itemReader,ItemProcessor<List<CSVPojo>,CsvWrapperPojo> itemProcessor,AmqpItemWriter<CsvWrapperPojo> itemWriter){
        Step step=stepBuilderFactory.get("ETL-CSV").<List<CSVPojo>,CsvWrapperPojo>chunk(100)
                .reader(itemReader)
                .processor(itemProcessor)
                .writer(itemWriter)
                .taskExecutor(taskExecutor())
                .throttleLimit(40)
                .build();



        Job csvJob= jobBuilderFactory.get("ETL").incrementer(new RunIdIncrementer())
        .start(step).build();

====Reader for SynchronizedItemStreamReader=================

@Component
public class Reader extends SynchronizedItemStreamReader<List<CSVPojo>> {

    public static MultipartFile reqFile=null;
    List<CSVPojo> result = new ArrayList<CSVPojo>();

    @Autowired
    private CSVProcessService csvProcessService;

    public static boolean batchJobState ;

    /*public Reader(MultipartFile file){

        this.reqFile=file;
    }*/

    public void setDelegate(ItemStreamReader<List<CSVPojo>> delegate){

        /*try {
            this.read();
        } catch (UnexpectedInputException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ParseException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (NonTransientResourceException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }*/
    }


    @Override
    public List<CSVPojo> read() throws Exception, UnexpectedInputException,
            ParseException, NonTransientResourceException {
        // TODO Auto-generated method stub
        if(!batchJobState){
        result=csvProcessService.processCSVFile(reqFile);
        System.out.println("in batch job reader");
        batchJobState=true;
        return result;
        }
        return null;
    }

}

Thanks in advance!!!

user3853393
  • 233
  • 4
  • 25

1 Answers1

0

You can use the partitioning technique to partition input files and process them in parallel. This is explained in details in the Partitioning section of the reference documentation.

You can also look at the Local partitioning sample and the Remote partitioning sample in the spring-batch-samples module.

There are similar questions to this one, I'm adding them here for reference:

Hope this helps.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Thnks, the above links are helpful. I have used Multi-threaded Step approach. Added my configuration code in above main thread. Here i have 2 records in csv file , it seems multiple threads acted on same file asynchronously and all the threads read and published to queue so i saw 24 records published(instead of 2). Is there any way i can read the file by multiple threads in synchronous way ? – user3853393 Nov 30 '18 at 14:24
  • I mean how we can make Batch job reader Thread safe? so that the read on file have to happen in Synchronized manner. – user3853393 Dec 06 '18 at 19:12
  • [`SynchronizedItemStreamReader`](https://docs.spring.io/spring-batch/4.1.x/api/org/springframework/batch/item/support/SynchronizedItemStreamReader.html) is what you are looking for. – Mahmoud Ben Hassine Dec 06 '18 at 19:51
  • Getting the error: Error creating bean with name 'reader' defined in URL , Invocation of init method failed; nested exception is java.lang.IllegalArgumentException: A delegate item reader is required. I have to override public void setDelegate() in my reader? Please see the reader related code in above thread. – user3853393 Dec 07 '18 at 19:30
  • No, you don't need to override `setDelegate`, you need to set the delegate reader in the synchromized one. Here is an example: https://github.com/spring-projects/spring-batch/blob/master/spring-batch-infrastructure/src/test/java/org/springframework/batch/item/support/SynchronizedItemStreamReaderTests.java#L92-L94 – Mahmoud Ben Hassine Dec 07 '18 at 19:43