I have a spring batch job which processes the flowing rows. I'm using a standard reader, processor and writer pattern.
load_id policy_number slice_numb asset_code surrender_value
923 V317865 V317865 XXH XXH 46230.340000
923 V318291 V318291 XXA XXA 40664.230000
923 V318757 V318757 XXA XXA 73263.360000
923 V318757 V318757 XXF XXF 36575.820000
923 V318757 V318757 XXI XXI 8723.330000
923 V318782 V318782 XXI XXI 9141.550000
923 V318782 V318782 XXF XXF 28329.550000
923 V318782 V318782 XXA XXA 76776.220000
For each row i process i need to get the SUM(surrender_value) for rows with the same policy_number. Note policy_number V318757 as a example with three rows. I need to report of the percentage of the total surrender value this row is providing.
I have two idea on how i might implement this but an unsure which is the better approach
First option - move the SUM/Grouping logic to the SQL query used by the reader. This means all the information i need is available to the processor, but i have to map some extra fields.
Second option - In order to aggregate the rows i'd add a pre-processor which would maintain a map of the totals per policy_number and a list of the rows effected. Once this processor is complete i'd pass the resulting data structure to a second processor which would do the standard work. My concern here is that the memory footprint could get very large as i cache the details of so many rows.
Any advise or guidance would be appreciated.