creating a job to fetch the data from the big query and process it. My approach is to get the data in the reader and then run it in the chunks and use task executor to run the chunks in different threads.
TripDateTimeDecider is used to decide the range for which the query will run in reader. TransactionReader is used to make the query to load the data. TransactionProcessor is used for processsing the data loaded. TransactionWriter is used to write the data to the table.
Flow I want: TripDateTimeDecider -> TransactionReader(get data from big query table)->run the threads with specified chunk for TransactionProcessor and TransactionWriter.
But I got: TripDateTimeDecider -> multiple thread TransactionReader reading same data->runs those threads with same data for TransactionProcessor and TransactionWriter.
- 2023-04-11 12:50:57.456 [taskExecutor-3] INFO c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime= 2022-03-01T00:00:00 and tripIntervalDateTime= 2022-03-01T06:00:00.0
- 2023-04-11 12:51:01.286 [taskExecutor-3] INFO c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00 , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
- 2023-04-11 12:51:01.287 [taskExecutor-4] INFO c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime= 2022-03-01T00:00:00 and tripIntervalDateTime= 2022-03-01T06:00:00.0
- 2023-04-11 12:51:01.287 [taskExecutor-4] INFO c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00 , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
- 2023-04-11 12:51:04.792 [taskExecutor-2] INFO c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime= 2022-03-01T00:00:00 and tripIntervalDateTime= 2022-03-01T06:00:00.0
- 2023-04-11 12:51:04.792 [taskExecutor-2] INFO c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00 , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
- 2023-04-11 12:51:04.792 [taskExecutor-1] INFO c.q.p.p.steps.TransactionReader - TransactionReader::read() for tripStartDateTime= 2022-03-01T00:00:00 and tripIntervalDateTime= 2022-03-01T06:00:00.0
- 2023-04-11 12:51:04.792 [taskExecutor-1] INFO c.q.p.p.utils.BigQuerySalesTransUtil - loadTransactionsFromURT for trip_start_date_time=2022-03-01T00:00:00 , tripIntervalDateTime= 2022-03-01T06:00:00.0 and currentEnv = dev
@Configuration
@EnableBatchProcessing
@EnableTransactionManagement
public class ReceiptScanningMicroBlinkJobConfig {
@Autowired
private JobBuilderFactory jobs;
@Autowired
private StepBuilderFactory steps;
@Autowired
private TripDateTimeDecider tripDateTimeDecider;
@Autowired
private MicroBlinkJobInitTasklet microBlinkJobInitTasklet;
@Autowired
private MicroBlinkJobEndTasklet microBlinkJobEndTasklet;
@Autowired
private StepBuilderFactory stepBuilderFactory;
private static final String WILL_BE_INJECTED = null;
@Bean
@StepScope
public ItemReader<TransactionReceiptScanRequest> transactionReader(@Value("#{jobExecutionContext['trip_start_date_time']}") String tripStartDateTime,
@Value("#{jobExecutionContext['trip_interval_date_time']}") String tripIntervalDateTime,
@Value("#{jobExecutionContext['interval_hours']}") String intervalHours,
@Value("#{jobExecutionContext['ignored_status_code']}") String ignoredStatusCode) {
return new TransactionReader(tripStartDateTime, tripIntervalDateTime, intervalHours, ignoredStatusCode);
}
@Bean
@StepScope
public ItemProcessor<TransactionReceiptScanRequest, TransactionReceiptScanRequest> transactionProcessor() {
return new TransactionProcessor();
}
@Bean
@StepScope
public ItemWriter<TransactionReceiptScanRequest> transactionWriter() {
return new TransactionWriter();
}
@Bean
protected Step processLines() {
return steps.get("processEntities").<TransactionReceiptScanRequest, TransactionReceiptScanRequest> chunk(10)
.reader(transactionReader(WILL_BE_INJECTED,WILL_BE_INJECTED,WILL_BE_INJECTED,WILL_BE_INJECTED))
.processor(transactionProcessor())
.writer(transactionWriter())
.taskExecutor(taskExecutor())
.build();
}
@Bean
public Job job() {
Flow flow = new FlowBuilder<SimpleFlow>("Job")
.next(tripDateTimeDecider)
.on(Constants.COMPLETED)
.end()
.from(tripDateTimeDecider)
.on(Constants.CONTINUE)
.to(initJobExecutionStep())
.next(processLines())
.next(endJobExecutionStep())
.next(tripDateTimeDecider)
.on(Constants.COMPLETED)
.end()
.build();
return jobs.get("Job")
.incrementer(new RunIdIncrementer())
.listener(new DefaultJobListener())
.start(flow)
.end()
.build();
}
// start -> Init tasklet to get max trip date and put in context
//startdate and endDate to reader
// only columns
@Bean
public Step initJobExecutionStep() {
return stepBuilderFactory
.get("microBlinkJobInitTasklet")
.tasklet(microBlinkJobInitTasklet)
.build();
}
@Bean
public Step endJobExecutionStep() {
return stepBuilderFactory
.get("microBlinkJobEndTasklet")
.tasklet(microBlinkJobEndTasklet)
.build();
}
@Bean
public TaskExecutor taskExecutor(){
ThreadPoolTaskExecutor threadPoolExecutor = new ThreadPoolTaskExecutor();
threadPoolExecutor.setCorePoolSize(5);
threadPoolExecutor.setMaxPoolSize(5);
threadPoolExecutor.setQueueCapacity(10);
// multiple instances jobs 5.5 Million ->63 days
return threadPoolExecutor;
}
}
the above is the batch job configuration.
` refer from
https://examples.javacodegeeks.com/java-development/enterprise-java/spring/batch/spring-batch-multithreading-example/ I want to run the reader once and then processor and writer should run in multiple threads based on chunk provided`