0

i have been playing with a spring batch job that reads a sample csv file and dumps the records into a table. My question is surrounding restarts, i have introduced a data issue in the file ( too long to insert) in the 3rd line

In the first run

The first two lines get inserted and the third line fails ( as expected )

when i restart

The fourth line is picked up and the rest of the file is processed

All the documentation seems to suggest that spring batch picks up where it left off, does it mean the 3rd ( problem record ) considered 'attempted' and hence wont be tried again? i was expecting all the restarts to fail untill i fixed the file.

@Bean
               public FlatFileItemReader<Person> reader() {
                               return new FlatFileItemReaderBuilder<Person>()
                                               .name("personItemReader")
                                               .resource(new ClassPathResource("sample-data.csv"))
                                               .delimited()
                                               .names(new String[]{"firstName", "lastName"})
                                               .fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
                                                               setTargetType(Person.class);
                                               }})
                                               .build();
               }

@Bean
               public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
                               return new JdbcBatchItemWriterBuilder<Person>()
                                               .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
                                               .sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
                                               .dataSource(dataSource)
                                               .build();
               }

@Bean
               public Step step1(JdbcBatchItemWriter<Person> writer) {
                               return stepBuilderFactory.get("step1")
                                               .<Person, Person> chunk(1)
                                                               .reader(reader())
                                               .processor(processor())
                                               .writer(writer)
                                               .taskExecutor(taskExecutor())
                                               .throttleLimit(1)
                                               .build();
               }

@Bean
               public Job importUserJob(JobCompletionNotificationListener listener) {
                               return jobBuilderFactory.get("importUserJob")
                                               .incrementer(new RunIdIncrementer())
                                               .listener(listener)
                                               .start(step1)
                                               .build();
               }
user993797
  • 115
  • 2
  • 2
  • 10

2 Answers2

0

Please let me know have you gone through below. If Its not clear I can share the same sample project in github

Spring Batch restart uncompleted jobs from the same execution and step

Spring Batch correctly restart uncompleted jobs in clustered environment

In production we always use "fault-tolerant" so that job will reject the wrong data and continue. Later operations will correct the data and re-execute the job again. Advantage here is huge volume of data can be continuously processed and no need to wait for data correction.

Please compare your code with below

https://github.com/ngecom/stackoverflow-springbatchRestart

Rakesh
  • 658
  • 6
  • 15
  • Sorry if i had not gotten the question right. i was able to restart the job ( not re-run it). the second run did not start from the failed record but one record after it. the records are like good1, good2, bad1, good3. the job fails at bad1 and the rerun picks up good3. Shouldnt the re-run pick up bad1? – user993797 Jan 02 '21 at 16:31
  • If you restart the job with with same parameter it will fail again as the data is wrong. I will be sharing the code shortly – Rakesh Jan 02 '21 at 18:40
  • So i went through this in detail and the thing that is causing this seems to be the task executor. if you add a task executor bean @Bean public TaskExecutor taskExecutor() { ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor() ; exec.setCorePoolSize(1); return exec; } and then add it to the step. It seems to reproduce the issue – user993797 Jan 03 '21 at 20:30
  • Please let me know why you are using thread pool if you want to use go with thread pools use partitions and multi resource to get full control. Job restart is a stable feature of spring batch and used by huge number of corporates. – Rakesh Jan 04 '21 at 05:47
0

You have set a RunIdIncrementer on your job, so you will have a new job instance on each run. You need to remove that incrementer and pass the file as a job parameter to have the same job instance on each run. With this approach, all restarts will fail until you fix the file.

As a side note, you can't have restartability if you use a multi-threaded step. This is because the state would not be consistent when using multiple threads. So you need to use a single threaded-step (remove the task executor). This is explained in the documentation here: Multi-threaded step.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50