7

My assumption

In my understanding "chunk oriented processing" in Spring Batch helps me to efficiently process multiple items in a single transaction. This includes efficient use of interfaces from external systems. As external communication includes overhead, it should be limited and chunk-oriented too. That's why we have the commit-level for the ItemWriter. So what I don't get is, why does the ItemReader still have to read item-by-item? Why can't I read chunks also?

Problem description

In my step, the reader has to call a webservice. And the writer will send this information to another webservice. That's why I wan't to do as few calls as necessary.

The interface of the ItemWriter is chunk-oriented - as you know for sure:

public abstract void write(List<? extends T> paramList) throws Exception;

But the ItemReader is not:

public abstract T read() throws Exception;

As a workaround I implemented a ChunkBufferingItemReader, which reads a list of items, stores them and returns items one-by-one whenever its read() method is called.

But when it comes to exception handling and restarting of a job now, this approach is getting messy. I'm getting the feeling that I'm doing work here, which the framework should do for me.

Question

So am I missing something? Is there any existing functionality in Spring Batch I just overlooked?

In another post it was suggested to change the return type of the ItemReader to a List. But then my ItemProcessor would have to emit multiple outputs from a single input. Is this the right approach?

I'm graceful for any best practices. Thanks in advance :-)

Peter Wippermann
  • 4,125
  • 5
  • 35
  • 48

2 Answers2

4

This is a draft for an implementation of the read() interface method.

public T read() throws Exception {
    while (this.items.isEmpty()) {
        final List<T> newItems = readChunk();
        if (newItems == null) {
            return null;
        }
        this.items.addAll(newItems);
    }
    return this.items.pop();
}

Please note, that items is a buffer for the items read in chunks and not requested by the framework yet.

Peter Wippermann
  • 4,125
  • 5
  • 35
  • 48
0

Spring Batch uses 'Chunk Oriented' processing style. (Not just chunk read or write, full process including read, process and write)

Chunk oriented processing refers to

  1. Read an item using ItemReader (Single Item)
  2. Process it using ItemProcessor, and aggregate the result (Result List is updated one by one).
  3. Once the commit interval is reached, the entire aggregated result (Result List) is written out using ItemWriter and then the transaction is committed.

Here is the code representation from SpringBatch doc

List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read()
    Object processedItem = itemProcessor.process(item);
    items.add(processedItem);
}
itemWriter.write(items);

As you said, if you need your reader to return multiple Items, make it a List. And if your processor also returns a List. Finally, your Writer will get a List of List.

Here is the code representation of the new case

List<List<Object>> resultList = new Arraylist<List<Object>>();
for(int i = 0; i < commitInterval; i++){
    List<Object> items = itemReader.read()
    List<Object> processedItems = itemProcessor.process(items);
    resultList.add(processedItems);
}
itemWriter.write(resultList);
Karthik Chandraraj
  • 1,051
  • 2
  • 14
  • 27
  • Thanks for your elaborated answer. Still I have to doubt whether lists of items are efficient. How can I leverage Spring Batch's fallback mechanism then? If there is an exception during processing of a chunk, Spring Batch will switch over to single processing then. This won't work anylonger. I still think Spring Batch is lacking a feature here. – Peter Wippermann Apr 01 '13 at 16:57
  • 1
    Returning a List from the reader also has the drawback that you can't easily filter items by returning null in the processor (so that it's really counted as filtered) and that the statistics like readCount, writeCount, filterCount etc. are not representing the number of items but the number of Lists of items. – James Jul 11 '14 at 19:37