I'm working with several huge (>2gb) XML files and their size is causing problems.
(For example, I'm using XMLReader in a PHP script to parse smaller ~500mb files, and that works fine, but 32-bit PHP can't open files this large.)
So - my idea is to eliminate big chunks of the file that I know I don't need.
For example, if the structure of the file looks like this:
<record id="1">
<a>
<detail>blah</detail>
....
<detail>blah</detail>
</a>
<b>
<detail>blah</detail>
....
<detail>blah</detail>
</b>
<c>
<detail>blah</detail>
....
<detail>blah</detail>
</c>
</record>
...
<record id="999999">
<a>
<detail>blah</detail>
....
<detail>blah</detail>
</a>
<b>
<detail>blah</detail>
....
<detail>blah</detail>
</b>
<c>
<detail>blah</detail>
....
<detail>blah</detail>
</c>
</record>
For my purposes - I only need the data in parent node <a>
for each record. If I could eliminate parent nodes <b>
and <c>
from every record, I could reduce the size of the file substantially, so it would be small enough to work with normally.
What's the best way to do something like this (hopefully with something like sed
or grep
or a free/cheap application)?
I've tried a trial version of Altova XML Spy and it won't even open the XML file (I assume it's because it's too large).