I have a set of tools which index large XML file (MediaWiki dump files) and use those indeces for random access to the individual records stored in the file. It works very well but I'm "parsing" the XML with string functions and/or regular expressions rather than a real XML parser which is a fragile solution should the way the files are created be changed in the future.
Do some or most XML parsers have ways to do such things?
(I have versions of my tools written in C, Perl, and Python. Parsing the entire files into some kind of database or mapping them into memory are not options.)
UPDATE
Here are rough statistics for comparison: The files I am using are mostly published each week or so, the size of the current one is 1,918,212,991 bytes. The C version of my indexing tool takes a few minutes on my netbook and only has to be run once for each new XML file published. Less often I use the same tools on another XML file whose current size is 30,565,654,976 bytes and was updated only 8 times in 2010.