0

I'm trying to make the next implementation.

Due to size reasons, I have to split my output file in, for example, 10k row chunks.

So, I need to dump 10k in file "out1.csv", the next 10k in file "out2.csv", and so on.

With one output file, the schema batch:chunk with reader-processor-writer is easy and direct.

The output stream is opened in batch:streams XML section inside the chunk, so I avoid the "Writer must be open before it can be written to" exception.

I want to make an implementation avoiding this strict and preset solution:

<batch:chunk reader="reader"  writer="compositeWriter" commit-interval="10000" processor-transactional="false">
    <batch:streams>
        <batch:stream ref="writer1" />
        <batch:stream ref="writer2" />
        <batch:stream ref="writer3" />
        .
        .
        .<batch:stream ref="writer20" />
    </batch:streams>
</batch:chunk>

<bean id="writer1" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
        <property name="resource" value="out1.csv" />
        ...
</bean>

<bean id="writer2" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
        <property name="resource" value="out2.csv" />
        ...
</bean>

...
<!-- writer 20 -->

Supposing that 20 writers are quite enough. I'm looking for a solution to create output writers dynamically (maybe programatically), open them and avoid the above exception.

SomethingDark
  • 13,229
  • 5
  • 50
  • 55
yaki_nuka
  • 724
  • 4
  • 26
  • 44

1 Answers1

2

Due to size reasons, I have to split my output file in, for example, 10k row chunks. So, I need to dump 10k in file "out1.csv", the next 10k in file "out2.csv", and so on.

You seem to be using a CompositeItemWriter, but this is not the way to go. What you need to use is the MultiResourceItemWriter which allows you to split the output by item count. In your case, you would need to configure a MultiResourceItemWriter and set the itemCountLimitPerResource to 10.000. You can also provide a ResourceSuffixCreator to customize the output file names like out1.csv, out2.csv, etc.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • thanks. I'll give it a try as soon as possible. One more question: is there any way to split the output by other criteria? – yaki_nuka May 18 '21 at 07:26
  • No, this writer is designed to split the output file by item count (which is what you are looking for according to your description). If you want to split the output by another criteria, you need a custom writer. – Mahmoud Ben Hassine May 18 '21 at 07:30
  • Ok, I accept the answer. I thought the approach for splitting by rows would be the same to generalize to split by other criteria. So, I'm in charge of my question, which wasn't right made. The answer fixes the question, but not my problem. I'll be more specific next time. Overmore, I don't understand and I'm upset because of a negative vote in the question. I overlooked the accepting because of this, but I have finally accepted the answer. Thank you – yaki_nuka May 20 '21 at 19:28
  • No problem. The most important part is to help you. I did not downvote your question, that's not me. Your question was about splitting the output by row (`10k row chunks`) and I tried to answer based on that. For custom criteria, you can get inspiration from the `MultiResourceItemWriter` and create your own writer. – Mahmoud Ben Hassine May 21 '21 at 06:46