3

I am looking to read Flatfile which is in 10 GB. For that, I chose to use ThreadPoolTaskExecutor to make my step multi-threded.

I am wondering how these 4 worker threads are working internally? How one thread doesn't read the data read by another thread. If someone can explain how it's working internally, that will be great help.

@Bean
@StepScope
public FlatFileItemReader<Transaction> fileTransactionReader(@Value("#{jobParameters['inputFlatFile']}") Resource resource) {

        return new FlatFileItemReaderBuilder<Transaction>()
                .saveState(false)
                .resource(resource)
                .delimited()
                .names(new String[] {"account", "amount", "timestamp"})
                .fieldSetMapper(fieldSet -> {
                    Transaction transaction = new Transaction();
                    transaction.setAccount(fieldSet.readString("account"));
                    transaction.setAmount(fieldSet.readBigDecimal("amount"));
                    transaction.setTimestamp(fieldSet.readDate("timestamp", "yyyy-MM-dd HH:mm:ss"));

                    return transaction;
                })
                .build();
    }

Code -

@Bean
public Job multithreadedJob() {
    return this.jobBuilderFactory.get("multithreadedJob")
            .start(step1())
            .build();
}

@Bean
public Step step1() {
    ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
    taskExecutor.setCorePoolSize(4);
    taskExecutor.setMaxPoolSize(4);
    taskExecutor.afterPropertiesSet();

    return this.stepBuilderFactory.get("step1")
            .<Transaction, Transaction>chunk(100)
            .reader(fileTransactionReader(null))
            .writer(writer(null))
            .taskExecutor(taskExecutor)
            .build();
}
Ken White
  • 123,280
  • 14
  • 225
  • 444
Jeff Cook
  • 7,956
  • 36
  • 115
  • 186

1 Answers1

3

FlatFileItemReader is not in itself thread-safe as it extends AbstractItemCountingItemStreamItemReader whose javadoc states Subclasses are inherently not thread-safe. So strictly speaking, you should wrap it in a SynchronizedItemStreamReader. See also: Can I use FlatfileItemReader with Taskexecutor?

Having said that, if you

  • don't care about restartability,
  • don't care about the line numbers,
  • don't use a mapping that would require state,
  • set saveState to false,
  • and don't change the reader's default bufferedReaderFactory,

then the reader is just a thin wrapper around

  • a BufferedReader whose method readLine is called for each FlatFileItemReader::read,
  • and a LineMapper that maps each line to the target type

And BufferedReader is thread-safe which makes your reader effectively safe to call in a multi-threaded step.

But beware: The Spring Batch API makes no promises about the thread-safety of the reader. Quite the opposite, actually. So, the multi-threaded behavior is at least in theory up to change in future versions. Furthermore, there are a lot of conditions listed above which someday may no longer hold for your implementation. Thus, using a SynchronizedItemStreamReader is really recommended.

See also Can spring batch multi-threaded step be used safely if number of items in file are very less?

Henning
  • 3,055
  • 6
  • 30