I have a very big XML file (500Mb). Is it possible to keep track of the position of the last parsed element in this case? So, say, if I have successfully parsed half of it or jvm has crashed abruptly, I can start immediately from the position where I left the last time.
-
You can obtain the [`Location`](http://woodstox.codehaus.org/javadoc/stax-api/1.0/javax/xml/stream/Location.html) from any `XMLEvent` and store that somewhere, but that doesn't contain enough information to restart a reader at the given position. It would at least let you fast-forward through the document until you're back where you left off. – Barend Dec 08 '11 at 10:13
1 Answers
You could presumably write some form of history store to contain structure up till the point you've parsed; however I suspect that to continue parsing from that point you would have to turn off all forms of validation on your parser - XML is intended to guarantee the structure and contents of a document from head to foot; it's not really designed for ad-hoc parsing.
In your case you would still need to be able to provide some form of context - perhaps by keeping the current working element tree in memory, concatenating this with the relevant header information and parsing as if you're starting over with a new file; only submitting the outstanding content instead of the whole file.
e.g, given the XML structure:
<root>
<child id="1">
<subchild id="1'/>
</child>
<child id="2'>
<subchild id="2"/>
<subchild id="3"/>
<child/>
If your parser crashes after parsing <child id="1"/>
, you need to craft a new pseudo-documnent containing a <root>
element, and also keep note of the fact that you have already parsed child 1 when you resume processing - in case of any dependency issues.

- 11,442
- 35
- 28