5

We have Scenario where we need to split large xml file of size more than 10GB in small chunks. Each chunk should contain 100 or 200 element. Example xml

<Employees>
  <Employee id="1">
    <age>29</age>
    <name>Pankaj</name>
    <gender>Male</gender>
    <role>Java Developer</role>
  </Employee>
  <Employee id="3">
    <age>35</age>
    <name>Lisa</name>
    <gender>Female</gender>
    <role>CEO</role>
  </Employee>
  <Employee id="3">
    <age>40</age>
    <name>Tom</name>
    <gender>Male</gender>
    <role>Manager</role>
  </Employee>
  <Employee id="3">
    <age>25</age>
    <name>Meghna</name>
    <gender>Female</gender>
    <role>Manager</role>
  </Employee>
  <Employee id="3">
    <age>29</age>
    <name>Pankaj</name>
    <gender>Male</gender>
    <role>Java Developer</role>
  </Employee>
  <Employee id="3">
    <age>35</age>
    <name>Lisa</name>
    <gender>Female</gender>
    <role>CEO</role>
  </Employee>
  <Employee id="3">
    <age>40</age>
    <name>Tom</name>
    <gender>Male</gender>
    <role>Manager</role>
 </Employee>
</Employees>

I have Stax parser code which will split file into small chunks. But each file contains only one complete Employee element, where I need 100 or 200 or more <Employee> elements in single file. Here is my java code

public static void main(String[] s) throws Exception{
     String prefix = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+"\n";
        String suffix = "\n</Employees>\n";
        int count=0;
        try {

        int i=0;
             XMLInputFactory xif = XMLInputFactory.newInstance();
             XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("D:\\Desktop\\Test\\latestxml\\test.xml"));
             xsr.nextTag(); // Advance to statements element

             TransformerFactory tf = TransformerFactory.newInstance();
             Transformer t = tf.newTransformer();
             while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
                 File file = new File("C:\\Users\\test\\Desktop\\xml\\"+"out"  +i+ ".xml");
                 FileOutputStream fos=new FileOutputStream(file,true);
                 t.transform(new StAXSource(xsr), new StreamResult(fos));
                 i++;

             }

        } catch (Exception e) {
            e.printStackTrace();
        }
Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Naveen
  • 366
  • 3
  • 12

2 Answers2

3

Do not put i with every iteration, it should be update with latest count when your iteration reach to 100 or 200

Like:

String outputPath = "/test/path/foo.txt";

    while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {

                    FileOutputStream file = new FileOutputStream(outputPath,true);
                     ... 
                     ...
                     count ++; 
                     if(count == 100){
                      i++;
                      outputPath = "/test/path/foo"+i+"txt";
                      count = 0;
                      }  
                 }
Simmant
  • 1,477
  • 25
  • 39
2

i hope i get it right but you only need to increment count each time when you add one employer

        File file = new File("out" + i + ".xml");
        FileOutputStream fos = new FileOutputStream(file, true);
        appendStuff("<Employees>",file);
        while (xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            count++;
            t.transform(new StAXSource(xsr), new StreamResult(fos));
            if(count == 100) {
                count = 0;
                i++;
                appendStuff("</Employees>",file);
                fos.close();
                file = new File("out" + i + ".xml");
                fos = new FileOutputStream(file, true);
                appendStuff("<Employees>",file);
            }
        }

Its not verly nice, but you get the idea

private static void appendStuff(String content, File file) throws IOException {
    FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
    BufferedWriter bw = new BufferedWriter(fw);
    bw.write(content);
    bw.close();
}
Kev
  • 577
  • 1
  • 10
  • 29
  • How to write in each file opening tag and ending tag along with splitting data. – Naveen Dec 08 '15 at 08:19
  • code is working fine, Thank you for it. But I don't no why "" is getting appending in front of each tag. – Naveen Dec 08 '15 at 10:01
  • You can read the StAXDocu or do it the not proper way and open the file and replace it http://www.tutorialspoint.com/java/java_string_replaceall.htm – Kev Dec 08 '15 at 10:11