1

I am fairly new to Spring batch framework.

I am trying to read about 1 million records in reader with commit-interval=10000 and in writer I need to do 2 things with the List of items.

  1. Store the list of items to DB
  2. Write few information from those items to a flat file.

I am thinking these two tasks can be run in parallel, rather than writing sequential java code in writer class.

What should be the best way to have two writers that operate in parallel and do their own tasks?

Tyr1on
  • 1,999
  • 2
  • 13
  • 20
  • Just to be clear, you're essentially asking for a `CompositeItemWriter` that works in parallel, correct? – Michael Minella Sep 22 '16 at 14:34
  • Yes, precisely. – Tyr1on Sep 22 '16 at 17:40
  • Then you'll need to write your own. There isn't one currently available to do that. You'll basically mimic the code in our `CompositeItemWriter` but delegate the writes to each to a `TaskExecutor`. However, to be 100% honest, 1 million records is not a lot (depending on the size of each record) so this may be more work than it's worth... – Michael Minella Sep 22 '16 at 19:51
  • Ok, thank you for your input. It helps. – Tyr1on Sep 22 '16 at 20:15

1 Answers1

2

You have several options.

option 1:

  • create a select step that writes all entries into a file
  • create two parallel steps that follow the first step. both of them have to read from the same file. one writes into the db, the other one into the file

disadvantages:

  • if one of the two parallel steps fails or skips items, the content of the file and of the db will not be consistent
  • you have to create an additional step

option 2:
instead of trying to run the writing to the file and db in parallel, make your chunks run in parallel:

  • use a SynchronizedItemStreamReader to read from your source (you have to use a synchronized reader if you are using parallel chunk processing)
  • use a composite writer which is configured with db-writer and filewriter (note: you have to wrap your filewriter into a synchronizedWriter -> there is no class in the framework for this, but the principle is the same as used in SynchronizedItemStreamReader)
  • configure your task to handle steps in parallel (set an asynctaskexecutor, set the throttlelimit)

advantage:

  • you can easily write 10 chunks in parallel if your db can handle it

disadvantages:

  • if you are using parallel chunk processing, restart within a job is not possible. This means, in case of a restart the step has to be executed completely, which means, that you need to handle entries that were already written to the db

option 3:
forget about parallelism: writing to a file is a lot faster than writing to a db, so the overhead shouldn't have a significant impact. Just use a composite writer configured with your db- and filewriter.

Hansjoerg Wingeier
  • 4,274
  • 4
  • 17
  • 25
  • Thank you for detailed response, @Hansjoerg. If I go with option 3, it will be sequential processing, isn't it? – Tyr1on Sep 22 '16 at 16:55
  • Yes it will be a sequential process. I would still use a composite item writer . This way the entries for both targets are written in the same transaction. Note that you will have to register both writers as "streams" to the step so that open close are called and the context is updated in order to have restartability. – Hansjoerg Wingeier Sep 22 '16 at 17:04