0

After read a lot about restarting in Spring Batch, i've learnt that:

  • SB can restart a step from the beginning where a job has failed. Example: Job1 -> step1, step2, step3 (FAIL) -> then you can restart from step3

I would like another behaviour, but I didnt find any solution that fits me.

I have a job with a single step. This step read a text file (can have a lot of lines). I want to cover our system in case of a non-expected ending (for example, our server shutdown abruptly) In this case if we have read X lines, i want to recover the job from X+1 lines to the end.

¿Is it this possible to achieve?

Thanks in advance.

Pabloch
  • 11
  • 3
  • 1
    Yes and is supported out-of-the-box. You need to enable the saving of meta-data with the job execution. Then if the step failed (or aborted) it will continue where it left of. – M. Deinum Feb 24 '21 at 18:27
  • @MahmoudBenHassine yes, it helps. The part of restart using Id is good. But if process ends abruptly, the status stay STARTED. I could restart from the point setting status FAILED to job and step. There is any way to restart without manipulate status in database? – Pabloch Feb 25 '21 at 10:46
  • `There is any way to restart without manipulate status in database? `: No, Spring Batch looks at the status of the job repository before starting/restarting any job, and if it finds a job execution as `STARTED`, it will think there is a running execution (while there isn't). The manual process is described here: https://stackoverflow.com/a/66211982/5019386. If you want to automate it, you need to write custom code that detects that a job has been killed abruptly (I really don't see how this is possible) and updates the job repository with the shown sql queries. – Mahmoud Ben Hassine Feb 25 '21 at 11:04
  • Thanks you @MahmoudBenHassine . It is possible that we only need to launch this batch once per day, so maybe we can check if it was killed viewing if batch has been STARTED for too much time (but it depends on how long our input file will be...) – Pabloch Feb 25 '21 at 11:11
  • This might work if you clearly define "too much time". – Mahmoud Ben Hassine Feb 25 '21 at 11:15

1 Answers1

0

IMO, if the job stops abruptly at line X and you want to start at line X+1, it will happen only if the chunk size is 1. Because when chunk size is 1, every processed entry is committed and the JobContext knows exactly when it failed and where to restart from.

When your chunk size is greater than 1, let's say 8 and your job abruptly stops when the item 4 was being processed, then the first 3 items processed in the chunk is also not committed to the job execution tables at time of failure and the context would start from the same chunk from the 1st entry. In this case you will process the first 3 items again!

This can be avoided if you enable graceful shutdown of Spring Batch job when a kernel interruption happens so that the chunk is completely processed before the process ends. This will minimize the number of incidents reported but there would still be chances like - when you server's power plug is pulled of and the program doesn't get a chance to do a graceful shutdown.

Suggestions:

  1. If the process is idempotent, no trouble if some items are re-processed
  2. If the process is not idempotent, probably some kind of de-duplication check can be placed in a CompositeProcessor.

For graceful shutdown of the job, I have written some PoC code. You can see it here - https://github.com/innovationchef/batchpay/blob/master/src/main/java/com/innovationchef/batchcommons/JobManager.java

Innovationchef
  • 338
  • 2
  • 10
  • `IMO, if the job stops abruptly at line X and you want to start at line X+1, it will happen only if the chunk size is 1`. If chunkSize=1 and the job stops abruptly at line X, then line X will be reprocessed on restart (since the error happened at that line an its transaction has been rolled back). So the job will restart at line X and not line X+1. – Mahmoud Ben Hassine Feb 25 '21 at 06:32
  • @Innovationchef thanks for your code, its very clear and useful. But I have a doubt. To check if a job is running, spring batch check if start_date is not null and if end_date is null (a not-ended process). But this doesn't solve the problem of an abrupt end. I solved, setting an end_date manually ( i dont like it but i don't know any alternative) And for check if a job is restartable, you compare status with STOPPED. My previous doubt can be applied here. What happens if status remains STARTED ? – Pabloch Feb 25 '21 at 11:01
  • @Pabloch I think we should not updated the Batch Tables ourselves for any reason. That is why users are given the JobExplorer object separately from JobRepository. Coming to your question - what if the status remains STARTED? -> You already know what are the statuses it can be in and you can have your own definitions of graft shutdown and restart. My code is just a POC where I tried to send a SIGINT via kernel and test if basic things work out or not. – Innovationchef Feb 25 '21 at 13:24