Using Spring batch 2.2.1, I have configured a Spring Batch Job, I used this approach:
Configuration is the following:
Tasklet uses ThreadPoolTaskExecutor limited to 15 threads
throttle-limit is equal to number of threads
Chunk is used with:
1 synchronized adapter of JdbcCursorItemReader to allow it's use by many threads as per Spring Batch documentation recommandation
You can synchronize the call to read() and as long as the processing and writing is the most expensive part of the chunk your step may still complete much faster than in a single threaded configuration.
saveState is false on JdbcCursorItemReader
A Custom ItemWriter based on JPA. Note that its processing of one item can vary in terms of processing time, it can take few millis to few seconds ( > 60s).
commit-interval set to 1 (I know it could be better but it's not the issue)
All jdbc pools are fine, regarding Spring Batch doc recommandation
Running the batch leads to very strange and bad results due to the following:
- at some step, if the items take some time to process by a writer, nearly all threads in the thread pool end up doing nothing instead of processing, only the slow writer is working.
Looking at Spring Batch code, root cause seems to be in this package:
- org/springframework/batch/repeat/support/
Is this way of working a feature or is it a limitation/bug ?
If it's a feature, what is the way by configuration to make all threads without being starved by long processing work without having to rewrite everything ?
Note that if all items take the same time, everything works fine and multi-threading is OK, but if one of the item processing takes much more time, then multi-threading is nearly useless for the time the slow process works.
Note I opened this issue: