0

I'm trying to set up a parser for a large XML file so I would like to take advantage of the SpringBatch framework characteristics to partition it.

I am new to this framework and I don't find any documentation or any examples of how can I do it. So I would be very pleased if someone could offer me any kind of orientation.

Is there any possibility of partition this file by its direct children of the XML? For example:

sample.xml (1gb)

<students>
    <student>
        <name>Sirius Black</name>
        <phone>123</phone>
    </student>
    <student>
        <name>Tom Riddle</name>
        <phone>349</phone>
    </student>
    <student>
        <name>Severus Snape</name>
        <phone>934</phone>
    </student>
</students>

I've studied examples about trying to partition flat files, but how can I do it with XML files?

PD: the direct child of that XML file would be "student"

Kaights
  • 37
  • 1
  • 1
  • 7

1 Answers1

0

To use a MultiResourcePartitioner you need to have multiple input files.

I would like to take advantage of the SpringBatch framework characteristics to partition it.

Please note that Spring Batch does not take care of partitioning the file. It is up to you to do this work upfront (using a SystemCommandTasklet for example). However, splitting a huge XML file into multiple files is not as easy as doing it for a flat file (with the split command for example). So using the partitioning technique with Spring Batch is only possible if you manage to split the XML file.

I recommend to start with a Multi-Threaded step and see if you get the result/performance you expect.

A similar question can be found here: Parse-load huge XML using Spring Batch framework

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Hi, I'm facing with the exactly same situation, I have a lot of huge XML files and I'm trying to speed up the reader, processor and writer process. The best choice for start stills Multi-threaded step in this case? – Guilherme Bernardi Mar 13 '23 at 23:09
  • Yes, I think a multi-threaded step is a good start and should improve the performance of the step. If this is still not enough, partitioning is the way to go. – Mahmoud Ben Hassine Mar 14 '23 at 11:28