Our program currently uses the libxml2 DOM API (xmlReadFile
) to load an entire file into memory. Unfortunately, this breaks down on "large" XML files, as the basic memory consumption of libxml2 DOM is about 4-5 times the base file size.
It seems libxml2 offers two APIs for reading XML when I don't want to store the whole tree in memory: SAX2 and xmlReader.
I haven't dug into the APIs yet, but I'm wondering which one is preferable under which circumstances?
Note: All I need to do with the XML file is populate some C++ datastructures with the data found in the XML file. And these will in turn be a lot smaller than the (very verbose) XML definition. At the moment, with xmlReadFile
and the DOM API the process takes about 100MB memory for a 20MB XML file. The C++ data in memory for such a file is more like 5MB -- so I could go from 1:4 to 4:1, which would already help a lot.