Spring Batch - Duplicates Elimination

Question

I have a scenario where I have to load a large chunk of Flat-File data into H2 database using SpringBatch. I have used a CustomReader and JdbcBatchItemWriter for this. After loading I found that there are duplicate rows in the database only while loading large size data(millions).

By reducing the commit-interval value ,there has been a decrease in the duplicates.But this is not an optimal solution as there would be an impact on performance by doing so.

Using a unique key while loading also seems to solve the problem but we are not allowed to alter the table structures as per the requirement.

Any other solution would be very helpful.

Thanks

I think you can use http://stackoverflow.com/questions/25021689/spring-batch-reading-a-large-flat-file-choices-to-scale-horizontally?rq=1 as base — Luca Basso Ricci, Apr 17 '15 at 07:01
"By reducing the commit-interval value ,there has been a decrease in the duplicates.But this is not an optimal solution as there would be an impact on performance by doing so" .... Why does reducing the commit interval reduce the duplicate row count? Is you logic filtering out duplicates? — Saifuddin Merchant, Apr 18 '15 at 16:48
Thanks Luca, but Im afraid we cannot use that logic as we do not know the gridsize for the input. — Tama10, Apr 22 '15 at 05:39
@SaifuddinMerchant - No, I dont think this is done by our logic. The JDBCBatchItemWriter class has an inbuilt logic which retries processing of a chunk when an exception occurs. Hence reducing the commit-interval to 1 removes duplicates .This is what I meant. So now I need to create a custom JDBCBatchItemWriter to overload the write method that invokes the RetryTemplate to process the errored chunk again. — Tama10, Apr 22 '15 at 05:47
You could set your custom retry policy on the JDBC writter... — Saifuddin Merchant, Apr 22 '15 at 05:55
Yes that is what Im trying.Do you have any samples that I could use — Tama10, Apr 22 '15 at 06:59

Spring Batch - Duplicates Elimination

0 Answers0