Spring Batch - Aggregating Processor

Question

I have a spring batch job which processes the flowing rows. I'm using a standard reader, processor and writer pattern.

load_id policy_number   slice_numb  asset_code  surrender_value
923     V317865     V317865 XXH     XXH         46230.340000
923     V318291     V318291 XXA     XXA         40664.230000
923     V318757     V318757 XXA     XXA         73263.360000
923     V318757     V318757 XXF     XXF         36575.820000
923     V318757     V318757 XXI     XXI         8723.330000
923     V318782     V318782 XXI     XXI         9141.550000
923     V318782     V318782 XXF     XXF         28329.550000
923     V318782     V318782 XXA     XXA         76776.220000

For each row i process i need to get the SUM(surrender_value) for rows with the same policy_number. Note policy_number V318757 as a example with three rows. I need to report of the percentage of the total surrender value this row is providing.

I have two idea on how i might implement this but an unsure which is the better approach

First option - move the SUM/Grouping logic to the SQL query used by the reader. This means all the information i need is available to the processor, but i have to map some extra fields.

Second option - In order to aggregate the rows i'd add a pre-processor which would maintain a map of the totals per policy_number and a list of the rows effected. Once this processor is complete i'd pass the resulting data structure to a second processor which would do the standard work. My concern here is that the memory footprint could get very large as i cache the details of so many rows.

Any advise or guidance would be appreciated.

Possible solution at http://stackoverflow.com/questions/19906772/grouping-summarizing-spring-batch-records/19908104#19908104 — Luca Basso Ricci, Feb 24 '14 at 19:29
@ballabax Thanks for that. I'm reluctant to move this logic to the writer phase since i feel it really belongs in the processor phase. I'll post a suggested solution later — emeraldjava, Feb 25 '14 at 10:40
Also http://stackoverflow.com/questions/18396259/how-to-write-more-then-one-class-in-spring-batch/18411497#18411497; move to writer is the right choice! — Luca Basso Ricci, Feb 25 '14 at 12:00

score 4 · Answer 1 · answered Feb 25 '14 at 18:16

I'd recommend doing that type of aggregation in the SQL query. Unless the data model this is coming from is very convoluted, adding that type of aggregation via SQL should be straight forward and removes issues of things like chunk boundaries that doing this in the processor/writer may cause (for example, if the first two records for V318757 appear in one chunk and the last one appears in another chunk you may not get the math correct. You can handle this with a custom CompletionPolicy but that adds complexity).

Spring Batch - Aggregating Processor

1 Answers1