0

In my organizations we use xslt 1.0 with xpath 1.0 in order to make big xml transformations. But now we need to make migration.

I am using VTD-XML to parse and transform a huge xml 200MB and I have over 20 xpath selects so for each of them I want to define:

AutoPilot relationAutoPilot = new AutoPilot();
relationsautoPilot.selectXPath("object[@class='Example']");

AutoPilot first= new AutoPilot();
first.selectXPath("other");
....

public void methodCalulate(final VTDNav partAutoPilot, final AutoPilot partAutoPilot, final AutoPilot first, final AutoPilot second,final AutoPilot third,final AutoPilot fourth) {
    navigator.push();
    partAutoPilot.bind(navigator);
    while ((i = partAutoPilot.evalXPath()) != -1) {
     ...
   method3(navigator,first,second,third,fourth);
   ..
   }

 method3(final VTDNav navigator,final AutoPilot first, final AutoPilot second,final AutoPilot third,final AutoPilot fourth) {
  navigator.push();
    first.bind(navigator);
   int i = -1;

    while ((i = first.evalXPath()) != -1) {
          //Do some buissiness logic
     method2(navigator,second,third,fourth);
    ..
    }
 }

So I tried to put them in the top of the method hierarchy not in while ,for-each and so on so to be able to spend less processor speed and memory, but have a problem. I am wondering how to pass the objects because:

  1. Simple case

    public void  methodCalculate(final VTDNav navigator, final AutoPilot partAutoPilot)
    

OK one AutoPilot ok.

  1. MoreComplex case methodCalculate calls method3->calls method2->calls method 1

    public void methodCalculate(final VTDNav navigator, final AutoPilot partAutoPilot, final AutoPilot first,final AutoPilot second,final AutoPilot third,final AutoPilot fourth){
     method3(navigator,first,second,third,fourth);
    }
    public void  method3(final VTDNav navigator,final AutoPilot second,final AutoPilot third,final AutoPilot fourth){
      method2(navigator,second,third,fourth);
     }
    ....
    

and every method needs sub select with relative xpath so I need to pass 4 AutoPilots even if there is new rules for the transformation other method_0 can come and I need to add additional AutoPilot. So I am wondering how to proceed in this situation? How to pass AutoObjects more efficiently, because they consume a lot of memory and their calculation is very expensive?

What I try? I put extract the AutoPilot from the top most levels in the method where methodCalulate is called so they are create once. And in methodCalulate create other AutoPilots. It improves the speed but on the next inline calls creating AutoPilots create the performance hell.

Edit: I can add List with AutoPilots, even map and a constant class for key to search the proper AutoPilot for given select. I do not know what is best option here.

Xelian
  • 16,680
  • 25
  • 99
  • 152
  • Have you considered using an array or arrayList as a top level storage for all the autoPilot object? you are on the right path to take xpath compilation outta the loop... as far as I can tell... – vtd-xml-author Aug 22 '16 at 09:38
  • Please also try to be more concrete on question posting... I can give you advise, or better, code examples, if you can post a detail example of what you want vtd-xml to do... – vtd-xml-author Aug 22 '16 at 09:39
  • Ok I am searching for advice how to solve this problem. If a pass an List how no know which xpath AutoPilot I need? How to be more precise? I want to make transformation like xslt, but using java with VTD-XML, I start to iterate on the first level object based on some xpath select, then iterate over attributes,sub elements and so on. With level increasing I need new select with new AutoPilot. A will edit my question. – Xelian Aug 22 '16 at 10:08
  • I will work with you, but you must simplify the problem to a point I can help solving the fundamental issue you have... I need an xml and what you want as output... – vtd-xml-author Aug 22 '16 at 20:13
  • essentially I want you to perform a divide-and-conquer on your end.. because that is you will have to go thru understanding your complex task anyway...do not try to solve every problem all at once...isolate them plz – vtd-xml-author Aug 22 '16 at 20:16
  • The problem you are facing dealing with a lot of xpath is no different from what a compiled xslt has to manage internally.. – vtd-xml-author Aug 23 '16 at 06:53
  • 1
    One more thing 200 mb is a small XML by vtd' standards – vtd-xml-author Aug 24 '16 at 04:25
  • Ok I know and for our organization where have 2-3 GB xml, but for my research I need 200 MB, which is big for average application. But the number of nesting is the problem. WHat is the limit with VTDNav and with Huge? Can I increase them some how? – Xelian Aug 24 '16 at 06:23
  • does you xml have namespaces ? disable them, with 64bit jvm, you can go up to 2GB with standard vtd-xml – vtd-xml-author Aug 24 '16 at 07:36

1 Answers1

1

AutoPilot compilation should happen outside a loop... but it doesn't consume a lot of memory... you can probably put all autoPilot object in a hash table and use the xpath string as hash key...

As you probably noticed, vtd-xml's xpath evaluation takes place in a loop, which is very different from DOM. While this may be a bit difficult to get used to... it has a lot of merits/benefits in achieving superior performance and low memory usage.

Also for simple xpath, such as those involving one child tag lookup, you can forgo the xpath, and instead base your app logic on the cursor directly... the same rules apply: make sure the cursor position going into a piece of app logic remains unchanged upon leaving that code block.

So you have a couple of options: the most common is push pop(). But for a simple child lookup, after calling toElement(FirstChild) all you have to do is call toElement(Parent) to return to the starting location.

vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30