Running read for each thread of taskExecutor so getting repeated data in Multi-threaded Step

Question

creating a job to fetch the data from the big query and process it. My approach is to get the data in the reader and then run it in the chunks and use task executor to run the chunks in different threads.

TripDateTimeDecider is used to decide the range for which the query will run in reader. TransactionReader is used to make the query to load the data. TransactionProcessor is used for processsing the data loaded. TransactionWriter is used to write the data to the table.

Flow I want: TripDateTimeDecider -> TransactionReader(get data from big query table)->run the threads with specified chunk for TransactionProcessor and TransactionWriter.

But I got: TripDateTimeDecider -> multiple thread TransactionReader reading same data->runs those threads with same data for TransactionProcessor and TransactionWriter.

 - 2023-04-11 12:50:57.456 [taskExecutor-3] INFO  c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime=  2022-03-01T00:00:00  and tripIntervalDateTime= 2022-03-01T06:00:00.0
     - 2023-04-11 12:51:01.286 [taskExecutor-3] INFO  c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00  , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
     - 2023-04-11 12:51:01.287 [taskExecutor-4] INFO  c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime=  2022-03-01T00:00:00  and tripIntervalDateTime= 2022-03-01T06:00:00.0
     - 2023-04-11 12:51:01.287 [taskExecutor-4] INFO  c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00  , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
     - 2023-04-11 12:51:04.792 [taskExecutor-2] INFO  c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime=  2022-03-01T00:00:00  and tripIntervalDateTime= 2022-03-01T06:00:00.0
     - 2023-04-11 12:51:04.792 [taskExecutor-2] INFO  c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00  , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
     - 2023-04-11 12:51:04.792 [taskExecutor-1] INFO  c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime=  2022-03-01T00:00:00  and tripIntervalDateTime= 2022-03-01T06:00:00.0
     - 2023-04-11 12:51:04.792 [taskExecutor-1] INFO  c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00  , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
    
    
    
   
    @Configuration
    @EnableBatchProcessing
    @EnableTransactionManagement
    public class ReceiptScanningMicroBlinkJobConfig {
    
        @Autowired
        private JobBuilderFactory jobs;
    
        @Autowired
        private StepBuilderFactory steps;
    
        @Autowired
        private TripDateTimeDecider tripDateTimeDecider;
    
        @Autowired
        private MicroBlinkJobInitTasklet microBlinkJobInitTasklet;
    
        @Autowired
        private MicroBlinkJobEndTasklet microBlinkJobEndTasklet;
    
        @Autowired
        private StepBuilderFactory stepBuilderFactory;
    
        private static final String WILL_BE_INJECTED = null;
    
        @Bean
        @StepScope
        public ItemReader<TransactionReceiptScanRequest> transactionReader(@Value("#{jobExecutionContext['trip_start_date_time']}") String tripStartDateTime,
                                                                           @Value("#{jobExecutionContext['trip_interval_date_time']}") String tripIntervalDateTime,
                                                                           @Value("#{jobExecutionContext['interval_hours']}") String intervalHours,
                                                                           @Value("#{jobExecutionContext['ignored_status_code']}") String ignoredStatusCode) {
            return new TransactionReader(tripStartDateTime, tripIntervalDateTime, intervalHours, ignoredStatusCode);
        }
    
        @Bean
        @StepScope
        public ItemProcessor<TransactionReceiptScanRequest, TransactionReceiptScanRequest> transactionProcessor() {
            return new TransactionProcessor();
        }
    
        @Bean
        @StepScope
        public ItemWriter<TransactionReceiptScanRequest> transactionWriter() {
            return new TransactionWriter();
        }
    
        @Bean
        protected Step processLines() {
            return steps.get("processEntities").<TransactionReceiptScanRequest, TransactionReceiptScanRequest> chunk(10)
                    .reader(transactionReader(WILL_BE_INJECTED,WILL_BE_INJECTED,WILL_BE_INJECTED,WILL_BE_INJECTED))
                    .processor(transactionProcessor())
                    .writer(transactionWriter())
                    .taskExecutor(taskExecutor())
                    .build();
        }
    
    
        @Bean
        public Job job() {
    
            Flow flow = new FlowBuilder<SimpleFlow>("Job")
                    .next(tripDateTimeDecider)
                    .on(Constants.COMPLETED)
                    .end()
                    .from(tripDateTimeDecider)
                    .on(Constants.CONTINUE)
                    .to(initJobExecutionStep())
                    .next(processLines())
                    .next(endJobExecutionStep())
                    .next(tripDateTimeDecider)
                    .on(Constants.COMPLETED)
                    .end()
                    .build();
    
            return jobs.get("Job")
                    .incrementer(new RunIdIncrementer())
                    .listener(new DefaultJobListener())
                    .start(flow)
                    .end()
                    .build();
        }
    
        // start -> Init tasklet to get max trip date and put in context
        //startdate and endDate to reader
        // only columns
    
        @Bean
        public Step initJobExecutionStep() {
            return stepBuilderFactory
                    .get("microBlinkJobInitTasklet")
                    .tasklet(microBlinkJobInitTasklet)
                    .build();
        }
    
        @Bean
        public Step endJobExecutionStep() {
            return stepBuilderFactory
                    .get("microBlinkJobEndTasklet")
                    .tasklet(microBlinkJobEndTasklet)
                    .build();
        }
    
        @Bean
        public TaskExecutor taskExecutor(){
            ThreadPoolTaskExecutor threadPoolExecutor = new ThreadPoolTaskExecutor();
            threadPoolExecutor.setCorePoolSize(5);
            threadPoolExecutor.setMaxPoolSize(5);
            threadPoolExecutor.setQueueCapacity(10);
            // multiple instances jobs 5.5 Million ->63 days
            return threadPoolExecutor;
        }
    }
    
    
the above is the batch job configuration.

`        refer from 
         https://examples.javacodegeeks.com/java-development/enterprise-java/spring/batch/spring-batch-multithreading-example/                  I want to run the reader once and then processor and writer should run in multiple threads based on chunk provided`

score 0 · Answer 1 · answered Apr 11 '23 at 10:53

0

This means your item reader is not thread-safe. You need to synchronize the reader by wrapping it in a SynchronizedIteamStreamReader.

Another option is to partition the input into distinct partitions and use multiple threads to process partitions concurrently. In this case, each thread will sequentially read items from the partition that was assigned to it.

answered Apr 11 '23 at 10:53

Mahmoud Ben Hassine

28,519
3
32
50

I had used public synchronized TransactionReceiptScanRequest read() throws Exception so the data is loaded by single thread and I have condition if (fieldValueListIterator == null) to load the data from big query so its being loaded by the sigle thread and then got distributed over the other threads. I will try suggested way as well. – NIRAJ KUMAR Apr 12 '23 at 06:05

Running read for each thread of taskExecutor so getting repeated data in Multi-threaded Step

1 Answers1