Deserialize XML without knowing type

Question

I have a database-column that contains some XML-data as a string. Hence I do not know the actual type that is represented by this xml I want to read the root-tag of this XML and than deserialize the complete content with this type. Because the embedded XML may be quite large and the table contains some thousand of these objects I need a fast solution for this. My first approach was on simply extracting the root-tag using some string-magic (probably using Regex), get the type by calling Type.GetType and then create the serializer for this type. But than I had a look on XMLReader which also supports a ValueType-property.

using (XmlReader reader = XmlReader.Create(new StringReader(myXmlAsString)))
{
    reader.MoveToContent(); // get the root-element
    Type type = reader.ValueType;

    XmlSerializer ser = new XmlSerializer(type);
    return ser.Deserialize(reader);
}

The problem I have to face is that reader.ValueType always returns string-type rather then the type represented by the root-tag.

Finally: which of the two solutions would be faster? Bottleneck on first one is supposed to be the regex-engine to get the tapeName, on second approach it might be reader-operations.

score 2 · Answer 1 · edited May 23 '17 at 12:22

The XML doesn't have a type (it's just structured text), so which type you want to use to deserialize is up to you, not the XML. This is why you have to pass a type name to XmlSerializer and it's why XmlReader cannot return a type name, even if it wanted to. If you check the serialized XML, you'll see that no .NET type name is included (unless you included it yourself).

Using XmlReader to get the root element name is a good approach. Certainly, you should absolutely not use a Regex for this, since XML is not a regular language -- speed is not important if the solution isn't correct. However, you should use reader.MoveToContent() to get at the root and not hard-coded .Read() calls to skip a particular number of nodes.

I wouldn't worry about the performance of this approach because most time will be spent on 1) shuttling the entire string from your database server to the client and 2) deserializing the contents. There are ways of cutting down on both 1) and 2), but that's a bit out of scope for this question.

Another solution which may or may not be appropriate for your situation is to use the XML support in SQL Server to read the root element (How to get the ROOT node name from SQL Server) as this will allow you to skip returning the element at all if you're not interested. This shifts processing to the server, which may or may not have favorable performance.

score 0 · Answer 2 · answered Oct 20 '14 at 11:48

XmlReader.ValueType is used to read the CLR type of an XML node, so it can't be used to determine your serialized custom class.

Regular expressions are not the most convenient solution when parsing XML data, use an XML-dedicated tool to check the name of your first element (e.g. LINQ to XML).

Deserialize XML without knowing type

2 Answers2