I am working with potentially "large" xml files where my application only cares about a very small subset of the data contained in the file. So I was hoping to avoid loading the entire xml document into DOM.
I have been successfully using Apache Xerces C++ with the Sax2 api to extract data directly from an xml file contained in a zip archive, using custom implementations of xercesc::BinInputStream
and xercesc::InputSource
However, now we want to apply modifications to a small subset of the nodes in the xml document (reading the original, and applying changes into a new xml file in a new zip archive). I was hoping to avoid loading the entire document into DOM just to modify a few nodes.
It would be nice to leverage the work I've already done with SAX2, but it appears that the SAX2 api is primarily oriented around reading documents. I could handle all SAX2 events, and write the information out to the new file as they occur, but I'm having difficulty locating xerces api functionality that would, for example, aid with handling xml entities (I really don't want to rewrite e.g. xml entity handling myself!) and other encoding issues.
I also noticed that xerces provides a xercesc::BinOuputStream
(which would appear to be what I would want to derive from in order to directly serialize to a zip archive), but I haven't found a place where I could plug such a custom output stream into the xerces api. I also haven't been able to locate a corresponding output analogue for xercesc::InputSource
.
Does xerces c++ provide any native functionality for writing xml documents in a streaming fashion?