4

Context: An existing system that is heavily based on passing XML around in various forms (XmlDocument, XDocument/XElement, string encoded). We are developing a new component that will talk to the existing system and will have it's own data store of some kind for holding XML for later processing. MongoDB seems like a good fit for the data store but it doesn't have native support for XML, so I'm wondering what good options exist for handling XML in MongoDB.

There are two options that come to mind:

1. Use an XML to JSON converter (for conversion in both directions)

I believe this will allow querying of the data and the creation on MongoDB indexes. There isn't an immediate need to do lots of querying or lots of different types of querying, but we would at the very least have to do some key based retrieval and maybe one or two queries on the values would be useful (certainly useful to keep that option open).

Is a generic XML-2-JSON converter a good fit here, or would a MongoDB/BSON converter be better?

Are there any specific downsides to converting to JSON/BSON? Could it ever result in loss of information, perhaps whitespace in blocks of element space could get mangled?

2. String (or binary) encode the XML and store it as a BSON byte array.

Pros

  • Simple.

Cons

  • Data becomes opaque to querying.

Are there additional pros/cons to the above two options? Are there other options available? Is this sane?! (e.g. is there a better fit for this problem than MongoDB?)

=== UPDATE ===

A working demo that uses Newtonsoft.Json for the XML to JSON conversion...

XElement fooElem = XElement.Load("foo.xml");
// Note. I used Formatting.Indented to make the JSON easily readable for debug purposes, otherwise it just adds unnecessary whitespace characters.
string jsonStr = JsonConvert.SerializeXNode(fooElem, Formatting.Indented);
BsonDocument bsonDoc = BsonDocument.Parse(jsonStr);

From there you can just call MongoDB as usual, e.g.:

await collection.InsertOneAsync(bsonDoc);

This is probably an OK/acceptable solution in my particular case, but more generally it has the overhead of converting to and then parsing a JSON string, which is unnecessary work. Ideally we would go from XElement direct to a BsonDocument.

redcalx
  • 8,177
  • 4
  • 56
  • 105
  • This is actually working. The only problem is(or it is supposed to be like that?) is that on some parameters it adds @ or #. How do you manage that? – Vulovic Vukasin May 18 '17 at 07:44

1 Answers1

1

You make a good point wrt. avoiding the necessity to parse JSON before persisting it to MongoDB, imo. You may or may not find commercial .NET products (libraries) tackling this general problem already.

Should you go for your own implementation, FWIW, I've been thinking of a document order-friendly general encoding of XML within JSON, recently, which I believe is round-trippable, with or without XML namespaces, and may inspire you.

Here's the PoC, in my answer to this other question ("xml to json mapping challenge") :

https://stackoverflow.com/a/35810403/1409653

'HTH,

Community
  • 1
  • 1
YSharp
  • 1,066
  • 9
  • 8