2

Have gone through with lot of internet findings and other sources, but could not find any good reason why spring batch transaction works on step level rather than job level in a multi-steps job.

Even if a step is considered as a domain ,independent unit of the job as per documentation but still it is a part of a job (task) that need to be done.

Let say a job needs to read a excel file (with 2 sheets ,each having 6 millions of rows).

I would say reading of each sheet has been broken in a form of step. each sheet has its own style of processing of rows and writing(Database).

Processing of any sheet is NOT having any relation to other sheet.These are independent unit. But overall processing of any sheet is sub-task of job.

And processing of a file should be considered FAILURE if there is any problem in any sheet and it should roll back any data written so far.

But because reading sheet has been in form of step and if a sheet has been processed successfully it won't be roll backed in case of failure noticed in next sheet.

I would dis-agree to merge both steps in a single one just for the sake of solution.

Why spring batch doesn't provide any option to roll back all steps in failure of a job(task)?

I don't want to forward data from one step to another (write once only to database) as that is not related data at all as I said earlier and that is huge too (6 million) rows for a very longer time of frame in memory.

Amit Sharma
  • 31
  • 1
  • 5

1 Answers1

1

First, it's not true that transactions are at the step level. They are finer that that -- at the chunk level.

The reason you don't one big transaction around the job is performance -- things would not look good if you had a transaction taking multiple hours, with multiple gigabytes of uncommitted data, and with a very high risk that at the end the transaction would have to be rolled back due to a conflict with another transaction or -- maybe worse -- blocking other users from accessing the same tables.

The way Spring Batch deals with errors midway is by allowing jobs to be restarted from the last successful step/chunk, after the error has been manually dealt with. This is not always possible, of course.

Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • You are correct by saying at chunk level but chunk in itself is a part of a step. Regarding performance had your answer been the same if I do say there are only 10 rows in each sheet ? Consequences of having large data in a transaction is a separate problem and that does exists even when we are not in context of spring batch. Design perspective there should be an option provided by spring batch. Within a step if an item gets failed it roll backs all items processed so far (with constraint of commit-interval) ,very same concept should be extended (as per choice by users) at job level. – Amit Sharma Feb 27 '16 at 04:45
  • Can anyone please help me out of this problem. – Amit Sharma Apr 04 '16 at 10:53