17

I have a problem with creating asynchronous processor in Spring Batch. My processor is getting ID from reader and creating object based on response from SOAP call. Sometimes for 1 input (ID) there must be e.g. 60-100 SOAP calls and sometimes just 1. I tried to make multithreaded step it was processing e.g 50 inputs at time but it was useless because 49 threads done their job in 1 second and were blocked, waiting for this one which was doing 60-100 SOAP calls. Now i use AsyncItemProcessor+AsyncItemWriter but this solution works slowly for me. As my input (IDs) is large, around 25k items read from DB i would like to start ~50-100 inputs at time.

Here is my configuration:

@Configuration
public class BatchConfig {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;
    @Autowired
    public StepBuilderFactory stepBuilderFactory;
    @Autowired
    private DatabaseConfig databaseConfig;
    @Value(value = "classpath:Categories.txt")
    private Resource categories;

    @Bean
    public Job processJob() throws Exception {
        return jobBuilderFactory.get("processJob").incrementer(new RunIdIncrementer()).listener(listener()).flow(orderStep1()).end().build();
    }

    @Bean
    public Step orderStep1() throws Exception {
        return stepBuilderFactory.get("orderStep1").<Category, CategoryDailyResult>chunk(1).reader(reader()).processor(asyncItemProcessor()).writer(asyncItemWriter()).taskExecutor(taskExecutor()).build();
    }

    @Bean
    public JobExecutionListener listener() {
        return new JobCompletionListener();
    }


    @Bean
    public ItemWriter asyncItemWriter() {
        AsyncItemWriter<CategoryDailyResult> asyncItemWriter = new AsyncItemWriter<>();
        asyncItemWriter.setDelegate(itemWriter());
        return asyncItemWriter;
    }

    @Bean
    public ItemWriter<CategoryDailyResult> itemWriter(){
        return new Writer();
    }

    @Bean
    public ItemProcessor asyncItemProcessor() {
        AsyncItemProcessor<Category, CategoryDailyResult> asyncItemProcessor = new AsyncItemProcessor<>();
        asyncItemProcessor.setDelegate(itemProcessor());
        asyncItemProcessor.setTaskExecutor(taskExecutor());
        return asyncItemProcessor;
    }

    @Bean
    public ItemProcessor<Category, CategoryDailyResult> itemProcessor(){
        return new Processor();
    }

    @Bean
    public TaskExecutor taskExecutor(){
        SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
        taskExecutor.setConcurrencyLimit(50);
        return taskExecutor;
    }

    @Bean(destroyMethod = "")
    public ItemReader<Category> reader() throws Exception {
        String query = "select c from Category c where not exists elements(c.children)";

        JpaPagingItemReader<Category> reader = new JpaPagingItemReader<>();
        reader.setSaveState(false);
        reader.setQueryString(query);
        reader.setEntityManagerFactory(databaseConfig.entityManagerFactory().getObject());
        reader.setPageSize(1);

        return reader;
    }
}

How can I boost my application? Maybe am I doing something wrong? Any feedback welcome ;)

@Edit: For input of IDs: 1 to 100 I want e.g 50 threads which are executing processor. I want them to not block each other: Thread1 process input "1" for 2 minutes and at this time I want Thread2 to process input "2", "8", "64" which are small and execute in few seconds.

@Edit2: My goal: I have 25k IDs in database, I read them with JpaPagingItemReader and every ID is processed by processor. Each item is independent of each other. For each ID i make SOAP call for 0-100 times in loop and then i create Object which i pass to Writer and save in database. How can I obtain best performance for such task?

crooked
  • 675
  • 2
  • 11
  • 22
  • "this solution works slowly for me". What does that mean? What is the bottleneck? Have you done any profiling? – Michael Minella Aug 18 '17 at 14:46
  • These SOAP calls in processor are bottleneck. For just 1 input with ~60 calls it takes around 3 minutes. It works slowly because other threads are waiting for this long one. – crooked Aug 18 '17 at 14:50
  • +1 to @MichaelMinella , please give us more context about what you've done and what you expect. Aside from that, you should use a ThreadPoolTaskExecutor, since the simple async creates a new thread for each task – Sebastian Yonekura Baeza Aug 18 '17 at 14:52
  • Edited my question + ThreadPoolTaskExecutor didn't boost processing. – crooked Aug 18 '17 at 15:05
  • Have you tried using spring batch partitioner class? You need to partition your 25k IDs into different batches, and for each batch do the processing. – Amit K Bist Aug 23 '17 at 18:55
  • Proposal: Can you extend the SOAP Service to accept more than one ID? If possible you could also send a batch of ID's to the service, eventually improving performance. – mrkernelpanic Sep 19 '17 at 13:17
  • @mrkernelpanic unfortunately I can't :/ – crooked Sep 19 '17 at 17:56

2 Answers2

3

You should partition your job. Add a partitioned step like so:

@Bean
public Step partitionedOrderStep1(Step orderStep1) {
    return stepBuilder.get("partitionedOrderStep1")
            .partitioner(orderStep1)
            .partitioner("orderStep1", new SimplePartitioner())
            .taskExecutor(taskExecutor())
            .gridSize(10)  //Number of concurrent partitions
            .build();
}

Then use that Step in your Job definition. The .gridSize() call configures the number of partitions to be concurrently executed. If any of your Reader, Processor, or Writer objects are stateful you need to annotate them with @StepScope.

Joe Chiavaroli
  • 316
  • 2
  • 7
0

@KCrookedHand: I have dealt with similar kind of scenario, I had to read couple of thousands and need to call SOAP Service (I have injected this into itemReader) for matching criteria.

My config looks like below, basically you have couple of options to achieve parallel processing and two of them are 'Partitioning' and 'Client Server' Approach. I chose partitioning because I will have more control on how many partitions I need based on my data.

Please ThreadPoolTaskExecutor as @MichaelMinella mentioned, for below Step-Execution using tasklet where it is applicable.

<batch:step id="notificationMapper">
            <batch:partition partitioner="partitioner"
                step="readXXXStep" />
        </batch:step>
    </batch:job>


    <batch:step id="readXXXStep">
        <batch:job ref="jobRef" job-launcher="jobLauncher"
            job-parameters-extractor="jobParameterExtractor" />
    </batch:step>

    <batch:job id="jobRef">

        <batch:step id="dummyStep" next="skippedItemsDecision">
            <batch:tasklet ref="dummyTasklet"/>
            <batch:listeners>
                <batch:listener ref="stepListener" />
            </batch:listeners>
        </batch:step>

        <batch:step id="xxx.readItems" next="xxx.then.finish">
            <batch:tasklet>
                <batch:chunk reader="xxxChunkReader" processor="chunkProcessor"
                    writer="itemWriter" commit-interval="100">
                </batch:chunk>
            </batch:tasklet>
            <batch:listeners>
                <batch:listener ref="taskletListener" />
            </batch:listeners>
        </batch:step>

        ...
Ashok Gudise
  • 79
  • 1
  • 7