5

I need to parse a huge xml file on server and send it to client.
I want to do the parsing on demand - meaning, to only parse and show the parent nodes at first, and when the client clicks on a parent node - to send a request to the server that tells which parent was selected, and just then to parse and send its children (again, not the whole sub-tree, but just the parents).
I thought about using STAX parser, but I don't understand how to work with it when it comes to parent-children relationship. How do I tell the parser not to continue to the next START-ELEMENT which is the child, but to skip to the next parent in its level? and also - is there a way to go back with the ITERATOR implementation? after choosing one parent and seeing its children, can I go back and see a previous parent?
I would really appreciate any suggestion!
Thank you.

user1579191
  • 91
  • 2
  • 10
  • i would need exactly simillar to get only branch of some parrent. however this can be done via Dom parser – To Kra May 06 '15 at 13:28

1 Answers1

3
  1. No, you can't skip a sub-tree of an XML document without parsing it first. That is true for every parser, not just StAX. (Knowing which point to skip to implies that you've already parsed the elements in between.)

  2. However by maintaining a nesting level counter that you increment with every start element event and decrement with every end element event, it's easy to ignore all the events that come from a level below your target level.

  3. Parsing is one way, not random access, you can't jump back and forth. (Again, this would assume that the parser stores a representation of everything parsed so far, which is exactly what StAX was created to avoid.) But of course you can try to record the byte position of each parent tag in the file, then later seek to it if you've got the file open for random access. There are quite a few pitfalls to this approach though.

All in all, your use case doesn't look like a good fit for StAX. Have you tried VTD-XML? Depending on how big your file is, it can be exactly what you want.

biziclop
  • 48,926
  • 12
  • 77
  • 104
  • Thanks for your detailed answer! I have a few questions though: 1)Can I search for the next parent (sibling) without parsing the sub tree, but just reading it - not doing a full parsing? or going over the sub-tree = parsing it? 2)I'm not sure if I need random access. I want to show a specific level each time. When clicking on a node - to show its first level children. Is random access required here? Do I need X-Path for this? and if I do - can I combine it with STAX? Or do I better look for other parsers? 3)Is VTD-XML good for files of 1GB? cause that's what I've got... – user1579191 Oct 29 '12 at 09:49
  • 1. Yes, I considered going over a subtree as parsing. Of course you don't need to record anything about these nodes (apart from the depth counter I mentioned, which is a single global `int`). 2. That's effectively random access, as every time a user clicks on a node, you have to start processing from a different location. 3. As far as I can tell, yes. But I'm not involved with that project and I never tried using it on files that big. – biziclop Oct 29 '12 at 13:04
  • Thanks! last question (I hope...): I want to know if I understood well: In Stax - it will take a lot of time to parse all the data each time I click on a node, but on the other hand it doesn't use much memory(It doesn't save any of the data? meaning that it doesn't matter if I parse a small file or a huge one?). On VTD-XML - it uses a lot of memory (at least in my case it's a lot) - but then it will be slow only when clicking on the first node (when it does the parsing for the first and only time?) - and afterwards will be pretty fast. Right? Thanks again, you realy helped me! – user1579191 Oct 30 '12 at 05:21
  • Yes, that is correct. I think the best thing to do is to write a short test for both solutions and see for yourself how long it takes. Who knows, one or both might even be a lot faster than what you're expecting. – biziclop Oct 30 '12 at 15:26