Hi I have a problem with Spring Batch, I create a Job with two step the first step read a csv file by chunks filter bad values and saves into db, and second call to a stored procedure.
My problem is that for some reason the first step only reads partially the data file a 2,5GB csv.
The file have about 13M records but only saves about 400K.
Anybody knows why this happens and how to solve it?
Java version: 8
Spring boot version 2.7.1
This is my step
@Autowired
@Bean(name = "load_data_in_db_step")
public Step importData(
MyProcessor processor,
MyReader reader,
TaskExecutor executor,
@Qualifier("step-transaction-manager") PlatformTransactionManager transactionManager
) {
return stepFactory.get("experian_portals_imports")
.<ExperianPortal, ExperianPortal>chunk(chunkSize)
.reader(reader)
.processor(processor)
.writer(new JpaItemWriterBuilder<ExperianPortal>()
.entityManagerFactory(factory)
.usePersist(true)
.build()
)
.transactionManager(transactionManager)
.allowStartIfComplete(true)
.taskExecutor(executor)
.build();
}
this is the definition of MyReader
@Slf4j
@Component
public class MyReader extends FlatFileItemReader<ExperianPortal>{
private final MyLineMapper mapper;
private final Resource fileToRead;
@Autowired
public ExperianPortalReader(
MyLineMapper mapper,
@Value("${ext.datafile}") String pathToDataFile
) {
this.mapper = mapper;
val formatter = DateTimeFormatter.ofPattern("yyyyMM");
fileToRead = new FileSystemResource(String.format(pathToDataFile, formatter.format(LocalDate.now())));
}
@Override
public void afterPropertiesSet() throws Exception {
setLineMapper(mapper);
setEncoding(StandardCharsets.ISO_8859_1.name());
setLinesToSkip(1);
setResource(fileToRead);
super.afterPropertiesSet();
}
}
edit: I already try to use a single thread strategy, i think that can be a problem with the RepeatTemplate, but i don't know how to use it correctly.
edit 2: I give up with a custom solution and I finished using default components they works ok, and the problem was solve.
Remember to use only spring batch components