0

Hi I have a problem with Spring Batch, I create a Job with two step the first step read a csv file by chunks filter bad values and saves into db, and second call to a stored procedure.

My problem is that for some reason the first step only reads partially the data file a 2,5GB csv.

The file have about 13M records but only saves about 400K.

Anybody knows why this happens and how to solve it?

Java version: 8

Spring boot version 2.7.1

This is my step

    @Autowired
    @Bean(name = "load_data_in_db_step")
    public Step importData(
            MyProcessor processor,
            MyReader reader,
            TaskExecutor executor,
            @Qualifier("step-transaction-manager") PlatformTransactionManager transactionManager
    ) {
        return stepFactory.get("experian_portals_imports")
                .<ExperianPortal, ExperianPortal>chunk(chunkSize)
                .reader(reader)
                .processor(processor)
                .writer(new JpaItemWriterBuilder<ExperianPortal>()
                        .entityManagerFactory(factory)
                        .usePersist(true)
                        .build()
                )
                .transactionManager(transactionManager)
                .allowStartIfComplete(true)
                .taskExecutor(executor)
                .build();
    }

this is the definition of MyReader

@Slf4j
@Component
public class MyReader extends FlatFileItemReader<ExperianPortal>{
    private final MyLineMapper mapper;
    private final Resource fileToRead;

    @Autowired
    public ExperianPortalReader(
            MyLineMapper mapper,
            @Value("${ext.datafile}") String pathToDataFile
    ) {
        this.mapper = mapper;
        val formatter = DateTimeFormatter.ofPattern("yyyyMM");
        fileToRead = new FileSystemResource(String.format(pathToDataFile, formatter.format(LocalDate.now())));
    }

    @Override
    public void afterPropertiesSet() throws Exception {
        setLineMapper(mapper);
        setEncoding(StandardCharsets.ISO_8859_1.name());
        setLinesToSkip(1);
        setResource(fileToRead);
        super.afterPropertiesSet();
    }


}

edit: I already try to use a single thread strategy, i think that can be a problem with the RepeatTemplate, but i don't know how to use it correctly.

edit 2: I give up with a custom solution and I finished using default components they works ok, and the problem was solve.

Remember to use only spring batch components

agl95
  • 1
  • 1

1 Answers1

0

This is because you are using a non thread-safe item reader in a multi-threaded step. Your item reader extends FlatFileItemReader, and FlatFileItemReader is not thread-safe: Using FlatFileItemReader with a TaskExecutor (Thread Safety). You can try with a single threaded-step (remove .taskExecutor(executor)) and you will see that the entire file will be read.

What happens is that threads are reading records concurrently and the read count is not honored (threads are incrementing the read count and the step "thinks" that the file has been read entirely). You have a few options here:

  • synchronize the call to read in your item reader
  • wrap your reader in a SynchronizedItemStreamReader (the result would the same as the previous point)
  • make your item reader bean step-scoped
Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Hi synchronization strategy or singlethread doesn't work in my case, read even less records, 9k, maybe the problem can be in the step closing policy, but i'm not sure about that – agl95 Aug 02 '22 at 18:59