I have to make performance test on VTD-XML library in order to make not just simple parsing but additional transformation in the parsing. So I have 30MB input XML and then I transform it with custom logic to other XML. SO I want to remove all thinks which slow the whole process which comes from my side(because of not good use of VTD library). I tried to search tips for optimization but can not find them. I noutised that:
'0'. What is better to use for selection selectXPath, or selectElement?
Use parsing without namespace is much faster.
File file = new File(fileName); VTDGen vtdGen = new VTDGen(); vtdGen.setDoc_BR(new byte[(int) file.length()]); vtdGen.parse(false);
Read from byte or pass to VTDGen ?
final VTDGen vg = new VTDGen(); vg.parseFile("books.xml", false);
or
// open a file and read the content into a byte array
File f = new File("books.xml");
FileInputStream fis = new FileInputStream(f);
byte[] b = new byte[(int) f.length()];
fis.read(b);
VTDGen vg = new VTDGen();
vg.setDoc(b);
vg.parse(true);
Using the second approach - 0.01 times faster...(can be from everything)
What is the difference with parseFile the file is limited upTo 2GB with namespaceaware true and 1GB witout but what for the byte approach?
- Reuse buffers
You can ask VTDGen to reuse VTD buffers for the next parsing task. Otherwise, by default, VTDGen will allocate new buffer for each parsing run.
Can you give an example for that?
- Adjust LC level to 5
By default, it is 3. But you can set it to 5. When your XML are deeply nested, setting LC level to 5 results in better XPath performance. But it increases memory usage and parsing time very slightly.
VTDGen vg = new VTDGen();
vtdGen.selectLcDepth(5);
But have runtime exception. Only works with 3
- Indexing
Use VTD+XML indexing- Instead of parsing XML files at the time of processing request, you can pre-index your XML into VTD+XML format and dump them on disk. When the processing request commences, simply load VTD+xml in memory and voila, parsing is no longer needed!!
VTDGen vg = new VTDGen();
if (vg.parseFile(inputName,true)){
vg.writeIndex(new FileOutputStream(outputName));
}
Can anyone knows how to use it? What happens if the file is changes, how to tripper new re-indexing. And if there is 10kb change in 3GB does the parsing will take time for the whole new file parsing or just for the changed lines?
- overwrite feature
The overwrite feature aka. data templating- Because VTD-XML retains XML in memory as is, you can actually create a template XML file (pre-indexed in vtd+xml) whose value fields are left blank and let your app fill in the blank, thus creating XML data that never need to be parsed.