Problem statement : I need to compare the details between two different databases and report the mismatches through an excel. One of these data source is a primary Datasource(DB2) which I trust and the secondary data source(in this case HANA) needs to be compared with Primary. I am a newbie to Spring batch. Any help is appreciated.
Design Approach : I started off with a chunk based processing and I am using an Item Reader to read the primary data (say for a day as I want these mismatches reported on a daily basis). As I understood with Spring documentation and references from Stackoverflow so far, there are suggestions to read the primary datasource through Item reader, for each item fetch the corresponding data from secondary database and build a summary object which will be written to say an Excel or CSV file.
Questions :
- Will it be an overkill to make a DB call in item processor for each record.? Or would it be an overkill to compare them in memory after reading both the datasource in parallel through the use of tasklet. (I am not sure if I can achieve the simultaneous read through tasklet yet). We are looking at roughly around 400,000 records in the Primary table during the first run and around 2000 for daily run thereafter.
- Also the primary database has 2 tables(under different schema one of which is the Base and provides a reference for comparison to the other table and other datasource) that needs to compared and I am currently achieving this through a join query and the Item reader fetches the data using this join query. Is there a better way to do this.?