Suppose I have very big XML file with entries having <id>
tags or id=""
properties.
How to search by this id? Can I create some search index or something.
Currently I am using org.w3.dom
. Does it have some means for searching?
UPDATE
My big XML file is a downloaded Wikipedia. It is 40G size and has millions of records.
Is it possible to index it with something like Lucene and then search for IDs fast?
UPDATE2
Have tried BaseX
. It ate my XML and created database of 32Gb. Haven't understand if it truncated data or 32Gb is because of some compressing.
Unfortunately, searching by ID requires 70-80 seconds or longer. So it is longer than Mediawiki API query.