7

Context

We have a batch job that replicates localized country names (i.e. translations of country names to different languages) to our DB from the external one. The idea was to process all localized country names for a single country in 1 chunk (i.e. first chunk - all translations for Andorra, next chunk - all translations for U.A.E., etc.). We use JdbcCursorItemReader for reading external data + some oracle analytic functions to provide total number of translations available for the country: something like

select country_code, language_code, localized_name, COUNT(1) OVER(PARTITION BY c_lng.country_code) as lng_count
from EXT_COUNTRY_LNG c_lng
order by c_lng.countty_code, c_lng.language_code

Problem

So cutting this input by chunks looks simple: stop chunk when you've read the exact amount of rows specified in lng_count and start a new one with the next read row, but it appears not to be so simple practically :(

First thing to try is a custom completion policy. But the problem is, it doesn't have access to the last item, read by ItemReader - you should explicitly put it to context in reader and get it back in policy. Don't like it 'cause it requires additional reader modifications/adding reader listeners. Moreover I don't like the same item being serialized/deserialized back and forth. And I don't feel like JobContext/StepContext is a good place for such data.

There's also RepeatContext which looks like a better place for such data, but I was not able to get to it easily...

So finally we end up with solution like this:

@Bean(name = "localizedCountryNamesStep")
@JobScope
public Step insertCountryStep(
        final StepBuilderFactory stepBuilderFactory,
        final MasterdataCountryNameReader countryNameReader,
        final MasterdataCountryNameProcessor countryNameProcessor,
        final MasterdataCountryNameWriter writer) {
    /* Use the same fixed-commit policy, but update it's chunk size dynamically */
    final SimpleCompletionPolicy policy = new SimpleCompletionPolicy();
    return stepBuilderFactory.get("localizedCountryNamesStep")
            .<ExtCountryLng, LocalizedCountryName> chunk(policy)
            .reader(countryNameReader)
            .listener(new ItemReadListener<ExtCountryLng>() {

                @Override
                public void beforeRead() {
                    // do nothing
                }

                @Override
                public void afterRead(final ExtCountryLng item) {
                    /* Update the cunk size after every read: consequent reads 
                    inside the same country = same chunk do nothing since lngCount is always the same there */
                    policy.setChunkSize(item.getLngCount());
                }

                @Override
                public void onReadError(final Exception ex) {
                    // do nothing
                }
            })
            .processor(countryNameProcessor)
            .writer(writer)
            .faultTolerant()
            .skip(RuntimeException.class)
            .skipLimit(Integer.MAX_VALUE) // Batch does not support unlimited skip
            .retryLimit(0) // this solution disables only retry, but not recover
            .build();
}

It's working, it requires minimum code changes, but it's still a bit ugly for me. So I'm wondering, is there another elegant way to do a dynamic chunk size in Spring Batch when all the required information is already available at the ItemReader?

FlasH from Ru
  • 1,165
  • 2
  • 13
  • 19
  • afterRead sounds not like the right spot to change the chunksize, i would place it in afterWrite to be effective on the next chunk – Michael Pralow May 23 '16 at 13:38
  • logically `afterWrite` sounds right, but 1) you don't have that information after writing a chuck w/o an extra DB query 2) size of the first chunk should still be determined somehow - another additional DB query? – FlasH from Ru May 23 '16 at 13:41
  • Are you wiping the target table out before your process? Or is this just a one-time job? – Dean Clark May 24 '16 at 13:26
  • @DeanClark , nope, that's a full-scale "reconciliation": new records are inserted, updated records get updated, deleted records are deleted. That's why it's essential to feed to a writer _all_ localized country names related to a single country at once. – FlasH from Ru May 24 '16 at 13:38

1 Answers1

5

The easiest way would be to simply partition your step by country. That way each country would get its own step, and you would also be able to thread across countries for increased performance.

If it needs to be a single reader, you can wrap a delegate PeekableItemReader and extend SimpleCompletionPolicy to accomplish your goal.

public class CountryPeekingCompletionPolicyReader extends SimpleCompletionPolicy implements ItemReader<CountrySpecificItem> {

    private PeekableItemReader<? extends CountrySpecificItem> delegate;

    private CountrySpecificItem currentReadItem = null;

    @Override
    public CountrySpecificItem read() throws UnexpectedInputException, ParseException, NonTransientResourceException, Exception {
        currentReadItem = delegate.read();
        return currentReadItem;
    }

    @Override
    public RepeatContext start(final RepeatContext context) {
        return new ComparisonPolicyTerminationContext(context);
    }

    protected class ComparisonPolicyTerminationContext extends SimpleTerminationContext {

        public ComparisonPolicyTerminationContext(final RepeatContext context) {
            super(context);
        }

        @Override
        public boolean isComplete() {
            final CountrySpecificItem nextReadItem = delegate.peek();

            // logic to check if same country
            if (currentReadItem.isSameCountry(nextReadItem)) {
                return false;
            }

            return true;
        }
    }
}

Then in your context you would define:

<batch:tasklet>
    <batch:chunk chunk-completion-policy="countrySpecificCompletionPolicy" reader="countrySpecificCompletionPolicy" writer="someWriter" />
</batch:tasklet>

<bean id="countrySpecificCompletionPolicy" class="CountryPeekingCompletionPolicyReader">
     <property name="delegate" ref="peekableReader" />
</bean>


<bean id="peekableReader" class="YourPeekableItemReader" />

Edit: Thinking back over your issue, partitioning strikes me as the cleanest approach. Using a partitioned step, each ItemReader (make sure scope="step") will be passed a single countryName from the step execution context. Yes, you'll need a custom Partitioner class to build up your map of execution contexts (one entry per country) and a hard-coded commit interval large enough to accommodate your largest unit of work, but after that everything is very boilerplate, and since each slave step will only be a single chunk, restart should be a relative breeze for any countries that might hit issues.

Dean Clark
  • 3,770
  • 1
  • 11
  • 26
  • That's where we've actually started from :) But it's my belief (correct me if I'm wrong) that such partitioning is actually against main Spring Batch concepts: you usually should be working with exact items, you are going to process and not combine Batch's functionality in your reader - it gives you more fine-grained control over the situation. But even aligned to my partitioning the peekable reader with completion strategy in one will work, but a custom implementation for it is still needed though... Let's wait for some more answers and if not - this one will be accepted ;) – FlasH from Ru May 24 '16 at 06:55
  • If each partition covers its own country, you could just set the commit interval to something quite large to make sure a commit covered even the largest country. That said, the "pure" spring batch approach would be a single reader/writer, chunk sizes that make sense from (perhaps 500 room something), and the restartability to pick up and reprocess from a failure mid-country. I actually have another thought that would be more "true north" and will edit my answer shortly. – Dean Clark May 24 '16 at 12:05
  • I tried to implement this solution. I had the following error : Bean property 'delegate' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter? Do you have any idea how to fix it ? – Nabil Salah Jun 21 '16 at 11:53
  • You probably need a `setDelegate(PeekableItemReader extends CountrySpecificItem> delegate)` method... any property needs an associated setter method – Dean Clark Jun 21 '16 at 13:51