I've got millions of XML documents in my filesystem and would like to query data from them fast. I'd be looking at getting data in one or multiple nodes.
- I've looked at XML databases, but the ecosystem doesn't look very mature ; I'm worried of going down a rabbit hole trying to implement it. it.
- I could convert the XML to JSON and put it in a NoSQL db like Mongo, but would be losing the nested structure of XML.
Having all the XMLs in memory and searching with lXML for instance is not an option, there are too many of them.
What would be the practical approaches in that case ? Any best practices ?