I am looking to read Flatfile which is in 10 GB. For that, I chose to use ThreadPoolTaskExecutor to make my step multi-threded.
I am wondering how these 4 worker threads are working internally? How one thread doesn't read the data read by another thread. If someone can explain how it's working internally, that will be great help.
@Bean
@StepScope
public FlatFileItemReader<Transaction> fileTransactionReader(@Value("#{jobParameters['inputFlatFile']}") Resource resource) {
return new FlatFileItemReaderBuilder<Transaction>()
.saveState(false)
.resource(resource)
.delimited()
.names(new String[] {"account", "amount", "timestamp"})
.fieldSetMapper(fieldSet -> {
Transaction transaction = new Transaction();
transaction.setAccount(fieldSet.readString("account"));
transaction.setAmount(fieldSet.readBigDecimal("amount"));
transaction.setTimestamp(fieldSet.readDate("timestamp", "yyyy-MM-dd HH:mm:ss"));
return transaction;
})
.build();
}
Code -
@Bean
public Job multithreadedJob() {
return this.jobBuilderFactory.get("multithreadedJob")
.start(step1())
.build();
}
@Bean
public Step step1() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.setMaxPoolSize(4);
taskExecutor.afterPropertiesSet();
return this.stepBuilderFactory.get("step1")
.<Transaction, Transaction>chunk(100)
.reader(fileTransactionReader(null))
.writer(writer(null))
.taskExecutor(taskExecutor)
.build();
}