1

I am using VTD-XML to split a large xml file into smaller xml files. Everything works great accept the:

autoPilot.selectXPath("//nodeName")

It is skipping over the first 3 nodes for some reason.

EDIT: vtd-xml-author pointed out that LOG.info("xpath has found "+ ap.evalXPath() +" items"); does not return the count but returns the node index.

The new split xml file is missing the first three nodes from the original file.

Here is basic XML layout. I can't display the true xml data but here is what it looks like:

<rootNode>
          <parentNode>
                      <contentNode>..children inside...</contentNode>
                      <contentNode>..children inside...</contentNode>
                      <contentNode>..children inside...</contentNode>
                      <contentNode>..children inside...</contentNode>
          </parentNode>
</rootNode>

And here is the function i am using to split the xml:

public void splitXml(String parentNode, String contentNodes)throws Exception {
    LOG.info("Splitting " + outputName + parentNode);
    VTDGen vg = new VTDGen();   

     if (vg.parseFile(xmlSource, true)){

        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        ap.selectXPath("//"+contentNode);

        int i=-1;
        int k=0;
        byte[] ba = vn.getXML().getBytes();
        FileOutputStream fos = getNewXml(parentNode);
        while((i=ap.evalXPath())!=-1){

            if(fos.getChannel().size() > maxFileSize){
                finishXml(fos,contentNode);
                LOG.info("Finished file with " + k + "nodes");
                fos = getNewXml(contentNode);
                k=0;
            }
            k++;
            long l = vn.getElementFragment();
            fos.write(ba, (int)l, (int)(l>>32));
            fos.write("\n".getBytes());
        }
        finishXml(fos,contentNode);
        LOG.info("Finished Splitting " + outputName + " " + parentNode + " with " +k+ " nodes");
    } else {
        LOG.info("Parse Failed");
    }


}

Edit: added in counter to while loop.

Paul Parker
  • 467
  • 1
  • 8
  • 21
  • Could you post your full xml? I'm not seeing 12 nodes in that.. – JWiley Oct 26 '12 at 14:24
  • I cannot post the actual XML file because it would be a HIPPA violation. I can assure you that there are 12 nodes in that section of the xml. – Paul Parker Oct 26 '12 at 14:59
  • Where would i find that information. I picked this up a month or 2 ago. – Paul Parker Oct 29 '12 at 13:59
  • @vtd-xml-author went and grabbed the latest version and still having the problem. – Paul Parker Oct 29 '12 at 15:00
  • This statement:LOG.info("xpath has found "+ ap.evalXPath() +" items"); doesn't tell you how many items in the xpath evaluation result, it only tells teh node index value of the first one if it is not zero, so there is something wrong already – vtd-xml-author Oct 29 '12 at 22:25
  • ok, so that explains the log info. But why would my output file be missing the first 3 records. – Paul Parker Oct 30 '12 at 14:54
  • can you put a counter variable in the while loop to record how many nodes in the result set? If this returns 9 instead of 12, i will try to duplicate the problem on my end. – vtd-xml-author Oct 30 '12 at 18:38

1 Answers1

1

as vtd-xml-author suggested i added in the counter to the while loop.

        while((i=ap.evalXPath())!=-1){
            // if filesize is at max create a new File
            if(fos.getChannel().size() > maxFileSize){
                finishXml(fos,contentNode);
                LOG.info("Finished file with " + k + "nodes");
                fos = getNewXml(contentNode);
                k=0;

            }
            k++;
            long l = vn.getElementFragment();
            fos.write(ba, (int)l, (int)(l>>32));
            fos.write("\n".getBytes());
        }

The first time i ran it the output was only missing 1 record. I then deleted the output xml files and the folder and re-ran it the splitter. This time it came back with the correct number in the log and correctly split the files. I repeated the process numerous times while deleting the created folder and files and also without deleting the files. I got the same correct results every time. I am guessing that the IDE or something wasn't refreshing correctly.

Paul Parker
  • 467
  • 1
  • 8
  • 21