2

I have application that polls multiple directories and than it sends job requests to Sring Batch, every directory is registered as different Flow. Is it possible to run this in parallel? I have this use case, because every directory is connected to different business entity, and when flow is stuck with malformed file or mq broker for particular entity is not present, others need to continue working.
I registered flows with IntegrationFlowContext.

@Configuration
@RequiredArgsConstructor
@Slf4j
public class IntegrationConfigSO implements CommandLineRunner {
    
    private final HalFileAdapterConfig config;
    private final JobRepository jobRepository;
    private final BatchJobs batchJobs;
    private final ApplicationIntegrationEventPublisher eventPublisher;
    private final IntegrationFlowContext flowContext;
    
    @Override
    public void run(String... args) throws Exception {
        registerFlows();
    }
    
    public void registerFlows() {
        Arrays.stream(config.getSystemsEnabled())
                .map(this::flow)
                .forEach(flow -> flowContext.registration(flow)
                        .id(UUID.randomUUID().toString())
                        .useFlowIdAsPrefix()
                        .register()
                );
        
    }
    
    public IntegrationFlow flow(String systemId) {
        return IntegrationFlows
                .from(
                        fileReadingMessageSource(systemId),
                        c -> c.poller(Pollers.fixedDelay(config.getPollTimeSeconds(), TimeUnit.SECONDS)
                                .maxMessagesPerPoll(config.getMaxFilesPerPoll())))
                .transform(fileMessageToJobRequest())
                .handle(jobLaunchingGateway())
                .channel("jobReplyChannel")
                .get();
    }
    
    
    public MessageSource<File> fileReadingMessageSource(String systemId) {
        FileReadingMessageSource source = new FileReadingMessageSource(getCustomFileComparator());
        source.setAutoCreateDirectory(true);
        source.setDirectory(new File(config.getBaseDirectory() + File.separatorChar + systemId));
        source.setScanner(directoryScanner());
        return source;
    }
    
    @Bean
    public DirectoryScanner directoryScanner() {
        CustomRecursiveDirScanner scanner = new CustomRecursiveDirScanner(config);
        CompositeFileListFilter<File> filters = new CompositeFileListFilter<>();
        filters.addFilter(new AcceptOnceFileListFilter<>());
        scanner.setFilter(filters);
        return scanner;
    }
    
    @Bean
    public FileMessageToJobRequest fileMessageToJobRequest() {
        FileMessageToJobRequest fileMessageToJobRequest = new FileMessageToJobRequest(config, eventPublisher);
        fileMessageToJobRequest.setJob(batchJobs.job());
        return fileMessageToJobRequest;
    }
    
    @Bean
    @Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
    public JobLaunchingGateway jobLaunchingGateway() {
        SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
        simpleJobLauncher.setJobRepository(jobRepository);
        simpleJobLauncher.setTaskExecutor(new SyncTaskExecutor());
        JobLaunchingGateway jobLaunchingGateway = new JobLaunchingGateway(simpleJobLauncher);
        jobLaunchingGateway.setOutputChannel(jobReplyChannel());
        return jobLaunchingGateway;
    }
    
    @Bean
    public MessageChannel jobReplyChannel() {
        return new DirectChannel();
    }
    
}

1 Answers1

1

Yes. It is valid, possible and working use-case. The poller in Spring Integration relies on the TaskScheduler and its thread pool. So, to be sure that all your parallel flows work, you need to make that thread pool big enough.

See docs for more info: https://docs.spring.io/spring-integration/docs/current/reference/html/configuration.html#namespace-taskscheduler

There is also a spring.integration.taskScheduler.poolSize global integration property. (Next section in that doc).

If you use Spring Boot, see the TaskScheduler auto-configuration: https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#features.task-execution-and-scheduling

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • As I am seeing in docs, everything is already in place for parallel execution. Default pool suits my needs (my test case is only 2 flows), but everything is happening synchronously. Job is starting only when previous one has finished. I added Thread.sleep(5000) in fileMessageToJobRequest() and it waits 5 seconds, other poller should make a new job request, but that happens only after 5 seconds delay. Am I missing something? – Dragoslav Petrovic Feb 01 '22 at 14:57
  • No you are not. That's probably how Spring Batch works. See its docs for more info: https://docs.spring.io/spring-batch/docs/current/reference/html/ – Artem Bilan Feb 01 '22 at 15:03
  • Looking at log output, DirectoryScanner works on the same thread ( [ scheduling-1] ) for both flows. I have some error in my config, just can't spot where. I enabled logging.level.org.springframework.integration=DEBUG, and pool size is 10 spring.integration.taskScheduler.poolSize=10. So the problem is in integration, not the spring batch. – Dragoslav Petrovic Feb 01 '22 at 19:14
  • 1
    No. That property is out of use if you rely on Spring Boot. Spring Boot provides for us only one thread for that scheduler. Consider to modify its specific configuration properties. – Artem Bilan Feb 01 '22 at 19:20
  • Thank you, everything works now. – Dragoslav Petrovic Feb 01 '22 at 21:42
  • Could you please update with the working solution? I am also having the same problem... I have set a taskExecutor in the PollerSpec but still my tasks are running sequentially... thanks – Diego Ramos Jul 26 '22 at 22:49
  • You need to configure this property: `spring.task.scheduling.pool.size`. It comes with one by default – Artem Bilan Jul 26 '22 at 23:42