0

I have a file of 50K records. It takes close to 40 mins to insert it into a DB. So I thought of applying a partition to the step in such a way that the 50k records are partitioned between 10 threads (via gridSize) with each thread processing 1000 records in parallel.

All the forums show examples of using JDBCPagingItemReader and partitioned count set via execution context. Since I am using MultiResourceItemReader, how can I set the partition count(startingIndex and endingIndex - refer code snippet below) for MultiResourceItemReader?

Please advise.

Code snippet of partitioner below:

public Map partition(int gridSize) {
    LOGGER.debug("START: Partition");
    Map partitionMap = new HashMap();
    int startingIndex = 0;
    int endingIndex =  1000;

    for(int i=0; i< gridSize; i++){
        ExecutionContext ctxMap = new ExecutionContext();
        ctxMap.putInt("startingIndex",startingIndex);
        ctxMap.putInt("endingIndex", endingIndex);

        startingIndex = endingIndex+1;
        endingIndex += 1000; 

        partitionMap.put("Thread:-"+i, ctxMap);
    }
    LOGGER.debug("END: Created Partitions of size: "+ partitionMap.size());
    return partitionMap;
}   
Gopi
  • 619
  • 2
  • 9
  • 27

1 Answers1

0

You don't set the partition count on the MultiResourceItemReader. You use the MultiResourcePartitioner to create a partition per resource (file) and then have the reader pick up each file separately as it's own partition. With that configuration, you don't need the MultiResourceItemReader anymore as well (you can go straight to the delegate).

There is a sample of this use case in the Spring Batch samples and can be found here: https://github.com/spring-projects/spring-batch/blob/master/spring-batch-samples/src/main/resources/jobs/partitionFileJob.xml

Michael Minella
  • 20,843
  • 4
  • 55
  • 67
  • Thanks for responding . I don't want partition per resource(File). I want the partition to happen for single file. Will this work for my scenario ? – Gopi Dec 22 '15 at 17:34
  • Partitioning within a single file typically doesn't improve performance since the process is typically IO bound anyways. What is the bottleneck here? 50k records shouldn't take that long to insert unless there is some other bottleneck... – Michael Minella Dec 23 '15 at 04:13
  • I am trying to find the bottleneck. Is there any way that i can create multiple threads and configure each thread to process the specific count of records based on commit-interval? – Gopi Dec 23 '15 at 12:38
  • Can you split the file? – Michael Minella Dec 28 '15 at 17:54