Correct XML serialization and deserialization of "mixed" types in .NET

Question

My current task involves writing a class library for processing HL7 CDA files.
These HL7 CDA files are XML files with a defined XML schema, so I used xsd.exe to generate .NET classes for XML serialization and deserialization.

The XML Schema contains various types which contain the mixed="true" attribute, specifying that an XML node of this type may contain normal text mixed with other XML nodes.
The relevant part of the XML schema for one of these types looks like this:

<xs:complexType name="StrucDoc.Paragraph" mixed="true">
    <xs:sequence>
        <xs:element name="caption" type="StrucDoc.Caption" minOccurs="0"/>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="br" type="StrucDoc.Br"/>
            <xs:element name="sub" type="StrucDoc.Sub"/>
            <xs:element name="sup" type="StrucDoc.Sup"/>
            <!-- ...other possible nodes... -->
        </xs:choice>
    </xs:sequence>
    <xs:attribute name="ID" type="xs:ID"/>
    <!-- ...other attributes... -->
</xs:complexType>

The generated code for this type looks like this:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string[] textField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlTextAttribute()]
    public string[] Text {
        get {
            return this.textField;
        }
        set {
            this.textField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

If I deserialize an XML element where the paragraph node looks like this:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item and text arrays are read like this:

itemsField = new object[]
{
    new StrucDocBr(),
    new StrucDocBr(),
};
textField = new string[]
{
    "first line",
    "third line",
};

From this there is no possible way to determine the exact order of the text and the other nodes.
If I serialize this again, the result looks exactly like this:

<paragraph>
    <br />
    <br />first linethird line
</paragraph>

The default serializer just serializes the items first and then the text.

I tried implementing IXmlSerializable on the StrucDocParagraph class so that I could control the deserialization and serialization of the content, but it's rather complex since there are so many classes involved and I didn't come to a solution yet because I don't know if the effort pays off.

Is there some kind of easy workaround to this problem, or is it even possible by doing custom serialization via IXmlSerializable? Or should I just use XmlDocument or XmlReader/XmlWriter to process these documents?

score 22 · Accepted Answer · answered Apr 06 '10 at 10:03

To solve this problem I had to modify the generated classes:

Move the XmlTextAttribute from the Text property to the Items property and add the parameter Type = typeof(string)
Remove the Text property
Remove the textField field

As a result the generated code (modified) looks like this:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    [System.Xml.Serialization.XmlTextAttribute(typeof(string))]
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

Now if I deserialize an XML element where the paragraph node looks like this:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item array is read like this:

itemsField = new object[]
{
    "first line",
    new StrucDocBr(),
    new StrucDocBr(),
    "third line",
};

This is exactly what I need, the order of the items and their content is correct.
And if I serialize this again, the result is again correct:

<paragraph>first line<br /><br />third line</paragraph>

What pointed me in the right direction was the answer by Guillaume, I also thought that it must be possible like this. And then there was this in the MSDN documentation to XmlTextAttribute:

You can apply the XmlTextAttribute to a field or property that returns an array of strings. You can also apply the attribute to an array of type Object but you must set the Type property to string. In that case, any strings inserted into the array are serialized as XML text.

So the serialization and deserialization work correct now, but I don't know if there are any other side effects. Maybe it's not possible to generate a schema from these classes with xsd.exe anymore, but I don't need that anyway.

This doesn't seem to work anymore (my System.Xml version is 4.0.0) The problem is that it tracks the names of the elements in the Items array via an ItemsElementName string array and the elements must match 1-to-1. This requirement causes an error if you are working from an object model that you have populated by deserializing an XML document, because the the XMLSerializer does not put representative entries in the ItemsElementName array for them. So a text node followed by an xml element followed by a text node results in 3 entries in your Items array, but only 1 in ItemsElementName. — theta-fish, Oct 30 '17 at 20:25
Thanks, I was having the exact same problem with HL7 CDA schema and this worked perfectly :) — user544511, May 14 '18 at 11:52

score 3 · Answer 2 · answered Mar 24 '11 at 14:43

I had the same problem as this, and came across this solution of altering the .cs generated by xsd.exe. Although it did work, I wasn't comfortable with altering the generated code, as I would need to remember to do it any time I regenerated the classes. It also led to some awkward code which had to test for and cast to XmlNode[] for the mailto elements.

My solution was to rethink the xsd. I ditched the use of the mixed type, and essentially defined my own mixed type.

I had this

XML: <text>some text <mailto>me@email.com</mailto>some more text</text>

<xs:complexType name="text" mixed="true">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="mailto" type="xs:string" />
    </xs:sequence>
  </xs:complexType>

and changed to

XML: <mytext><text>some text </text><mailto>me@email.com</mailto><text>some more text</text></mytext>

<xs:complexType name="mytext">
    <xs:sequence>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="text">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string" />
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
        <xs:element name="mailto">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string" />
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>

My generated code now gives me a class myText:

public partial class myText{

    private object[] itemsField;

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("mailto", typeof(myTextTextMailto))]
    [System.Xml.Serialization.XmlElementAttribute("text", typeof(myTextText))]
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }
}

the order of the elements is now preserved in the serilization/deserialisation, but i do have to test for/ cast to/program against the types myTextTextMailto and myTextText.

Just thought I'd throw that in as an alternative approach which worked for me.

I agree that your approach is the preferred solution to this problem for someone who defines and uses his own XML schema. My problem was that I did not have the option to alter the XSD because it was controlled by a third party. So I had to modify the generated classes which is, for the reasons you stated, something that should only be done if there is no other way. — Stefan Podskubka, Mar 25 '11 at 08:04

Guillaume · Answer 3 · 2010-04-02T15:47:14.490

0

What about

itemsField = new object[] 
{ 
    "first line", 
    new StrucDocBr(), 
    new StrucDocBr(), 
    "third line", 
};

?

edited Apr 02 '10 at 15:47

answered Apr 02 '10 at 15:25

Guillaume

12,824
3
40
48

1

This results in an InvalidOperationException when I try to serialize the object (because of the strings inside the itemsField, the itemsField array may only contain Objects of those types which are specified by the [XmlElement] attributes for the public property 'Items'). – Stefan Podskubka Apr 06 '10 at 06:37
You may find help here : http://msdn.microsoft.com/en-us/library/kz8z99ds.aspx Any schema validation warning ? – Guillaume Apr 06 '10 at 08:06
1

I already found that page during my searches, it's about another problem. The schema of my xml documents is correct, I validate it both before deserialization and after serialization. But I did find the answer to my problem just now, your suggestion with the itemsField array was already close, it just needed some further modifications in the generated code. I will post it in a few minutes. – Stefan Podskubka Apr 06 '10 at 09:09

Correct XML serialization and deserialization of "mixed" types in .NET

3 Answers3

Linked

Related