0

I've got millions of XML documents in my filesystem and would like to query data from them fast. I'd be looking at getting data in one or multiple nodes.

  • I've looked at XML databases, but the ecosystem doesn't look very mature ; I'm worried of going down a rabbit hole trying to implement it. it.
  • I could convert the XML to JSON and put it in a NoSQL db like Mongo, but would be losing the nested structure of XML.

Having all the XMLs in memory and searching with lXML for instance is not an option, there are too many of them.

What would be the practical approaches in that case ? Any best practices ?

Matthieu
  • 316
  • 4
  • 14

1 Answers1

0

Definitely a case for an XML database like BaseX or EXist.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks @michael. I've looked at both of these options, but was somewhat turned off by their lack of adoption when compared to well established SQL or NoSQL alternatives. Is that something I should worry about ? – Matthieu Sep 16 '22 at 13:14
  • I have no data on the level of adoption of these products, if you have any, please share it. It is very hard to get data on the level of adoption of open source technology. I think you will find that both these products have thriving user communities, however. – Michael Kay Sep 20 '22 at 05:36