Context: An existing system that is heavily based on passing XML around in various forms (XmlDocument, XDocument/XElement, string encoded). We are developing a new component that will talk to the existing system and will have it's own data store of some kind for holding XML for later processing. MongoDB seems like a good fit for the data store but it doesn't have native support for XML, so I'm wondering what good options exist for handling XML in MongoDB.
There are two options that come to mind:
1. Use an XML to JSON converter (for conversion in both directions)
I believe this will allow querying of the data and the creation on MongoDB indexes. There isn't an immediate need to do lots of querying or lots of different types of querying, but we would at the very least have to do some key based retrieval and maybe one or two queries on the values would be useful (certainly useful to keep that option open).
Is a generic XML-2-JSON converter a good fit here, or would a MongoDB/BSON converter be better?
Are there any specific downsides to converting to JSON/BSON? Could it ever result in loss of information, perhaps whitespace in blocks of element space could get mangled?
2. String (or binary) encode the XML and store it as a BSON byte array.
Pros
- Simple.
Cons
- Data becomes opaque to querying.
Are there additional pros/cons to the above two options? Are there other options available? Is this sane?! (e.g. is there a better fit for this problem than MongoDB?)
=== UPDATE ===
A working demo that uses Newtonsoft.Json for the XML to JSON conversion...
XElement fooElem = XElement.Load("foo.xml");
// Note. I used Formatting.Indented to make the JSON easily readable for debug purposes, otherwise it just adds unnecessary whitespace characters.
string jsonStr = JsonConvert.SerializeXNode(fooElem, Formatting.Indented);
BsonDocument bsonDoc = BsonDocument.Parse(jsonStr);
From there you can just call MongoDB as usual, e.g.:
await collection.InsertOneAsync(bsonDoc);
This is probably an OK/acceptable solution in my particular case, but more generally it has the overhead of converting to and then parsing a JSON string, which is unnecessary work. Ideally we would go from XElement direct to a BsonDocument.