0

I'm using Spring batch 5 & I have a usecase where I need to group the input records (from database) based on bookingId. Now the processor should pick the records group by group from the reader output. Also, the writer should write output group by group.

How can we achieve this with RepositoryItemReader. I'm struggling to find a sample code for Spring Batch 5. It's a shame that there are no enough samples on this standard usecase. I would appreciate your help in this, in case you have worked on similar usecases.

Rahul Raj
  • 3,197
  • 5
  • 35
  • 55
  • About grouping logically related records see my solution at https://stackoverflow.com/a/46346438/685806 but it does not use `RepositoryItemReader`. – Pino May 30 '23 at 15:57
  • @Pino I need to group the records based on a database column, and NOT database rows. – Rahul Raj May 30 '23 at 16:22
  • Not sure if I got your point but a simple `order by bookingId` clause added to the SQL statement of JdbcItemReader should do the trick. – Mar-Z May 30 '23 at 17:49
  • @Mar-Z can you explain how this would work? when we order it by `bookingId`, how to ensure that the processor gets the list of records that has only a particular `bookingId` (list containing all same) ? I need to ensure that the next execution of processor should take a list with different `bookingId` – Rahul Raj May 30 '23 at 17:51
  • The processor doesn't work on a list of items. It takes a single item (= database row) from the reader. Only the writer works on a list of items gathered in a chunk. My proposal of ordering was to process items in a desired sequence. – Mar-Z May 30 '23 at 18:14
  • @Mar-Z I want one bookingId records to be executed in one thread/execution clubbed as a group. I do not want the same bookingId to be executed across multiple threads. – Rahul Raj May 31 '23 at 04:53
  • Rahul, my solution groups records having the same valute on one or more db column, just like you want. Read it better. – Pino May 31 '23 at 15:23

1 Answers1

0

The chunk-oriented processing model is not suitable for this use case. No matter how you group items in your query (sql group by for instance) , items from the same group could span multiple chunks when returned from the reader. Therefore, it is impossible to have a chunk per group.

Of course it is technically possible to abuse the pattern and find a way to implement that requirement, but every solution would not be clean as it would use a model for use case that is was not designed for.

The "cleanest" way I see to address this is to use a partitioned step, where partitions are created by the grouping criteria (bookingId in your case).

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • I don't agree, see my solution at https://stackoverflow.com/a/46346438/685806 – Pino May 31 '23 at 15:25
  • @Pino That solution falls in the "abusing the model for a use case that is was not designed for" category, and has a few issues: 1) it requires the input to be sorted (as documented, but this is a hard constraint), 2) it returns a list of items and not individual items and 3) one could end up with a chunk of lists of items from different groups (for example, with a chunkSize=2 I can have one list of 10 items with groupId=x and another list of 50 items of gropupId=y) which is not what the OP asked for in : "the writer should write output group by group.". – Mahmoud Ben Hassine May 31 '23 at 17:20
  • Again, and as mentioned in the answer, while it is technically possible to implement grouping/aggregation with that model, the model itself is not suitable for such use cases. – Mahmoud Ben Hassine May 31 '23 at 17:22
  • About "abusing the model", I think it could be said of any class that extends a given class to add a new feature. About chunkSize, with my solution it refers to the collecting objects, not to the underlying records. – Pino Jun 01 '23 at 18:43