How to deserialize only part of an XML document in C#

Question

Here's a fictitious example of the problem I'm trying to solve. If I'm working in C#, and have XML like this:

<?xml version="1.0" encoding="utf-8"?>
<Cars>
  <Car>
    <StockNumber>1020</StockNumber>
    <Make>Nissan</Make>
    <Model>Sentra</Model>
  </Car>
  <Car>
    <StockNumber>1010</StockNumber>
    <Make>Toyota</Make>
    <Model>Corolla</Model>
  </Car>
  <SalesPerson>
    <Company>Acme Sales</Company>
    <Position>
       <Salary>
          <Amount>1000</Amount>
          <Unit>Dollars</Unit>
    ... and on... and on....
  </SalesPerson>
</Cars>

the XML inside SalesPerson can be very long, megabytes in size. I want to deserialize the tag, but not deserialize the SalesPerson XML element instead keeping it in raw form "for later on".

Essentially I would like to be able to use this as a Objects representation of the XML.

[System.Xml.Serialization.XmlRootAttribute("Cars", Namespace = "", IsNullable = false)]
public class Cars
{
    [XmlArrayItem(typeof(Car))]
    public Car[] Car { get; set; }

    public Stream SalesPerson { get; set; }
}

public class Car
{
    [System.Xml.Serialization.XmlElementAttribute("StockNumber")]
    public string StockNumber{ get; set; }

    [System.Xml.Serialization.XmlElementAttribute("Make")]
    public string Make{ get; set; }

    [System.Xml.Serialization.XmlElementAttribute("Model")]
    public string Model{ get; set; }
}

where the SalesPerson property on the Cars object would contain a stream with the raw xml that is within the <SalesPerson> xml element after being run through an XmlSerializer.

Can this be done? Can I choose to only deserialize "part of" an xml document?

Thanks! -Mike

p.s. example xml stolen from How to Deserialize XML document

score 43 · Accepted Answer · answered Feb 12 '10 at 11:53

It might be a bit old thread, but i will post anyway. i had the same problem (needed to deserialize like 10kb of data from a file that had more than 1MB). In main object (which has a InnerObject that needs to be deserializer) i implemented a IXmlSerializable interface, then changed the ReadXml method.

We have xmlTextReader as input , the first line is to read till a XML tag:

reader.ReadToDescendant("InnerObjectTag"); //tag which matches the InnerObject

Then create XMLSerializer for a type of the object we want to deserialize and deserialize it

XmlSerializer   serializer = new XmlSerializer(typeof(InnerObject));

this.innerObject = serializer.Deserialize(reader.ReadSubtree()); //this gives serializer the part of XML that is for  the innerObject data

reader.close(); //now skip the rest

this saved me a lot of time to deserialize and allows me to read just a part of XML (just some details that describe the file, which might help the user to decide if the file is what he wants to load).

Great solution, I found however that I also needed to set the xml root of the fragment to avoid an exception with an inner exception saying ... xmlns=''> was not expected. I added another answer for my solution, because of hte limits of comment lengths. — Stig Schmidt Nielsson, Apr 14 '16 at 06:55

score 7 · Answer 2 · edited May 23 '17 at 12:09

The accepted answer from user271807 is a great solution but I found, that I also needed to set the xml root of the fragment to avoid an exception with an inner exception saying something like this:

...xmlns=''> was not expected

This exception was trown when I tried to deserialize only the inner Authentication element of this xml document:

<?xml version=""1.0"" encoding=""UTF-8""?>
<Api>
  <Authentication>                       
      <sessionid>xxx</sessionid>
      <errormessage>xxx</errormessage>                
  </Authentication>
</ApI>

So I ended up creating this extension method as a reusable solution - warning contains a memory leak, see below:

public static T DeserializeXml<T>(this string @this, string innerStartTag = null)
        {
            using (var stringReader = new StringReader(@this))
            using (var xmlReader = XmlReader.Create(stringReader)) {
                if (innerStartTag != null) {
                    xmlReader.ReadToDescendant(innerStartTag);
                    var xmlSerializer = new XmlSerializer(typeof(T), new XmlRootAttribute(innerStartTag));
                    return (T)xmlSerializer.Deserialize(xmlReader.ReadSubtree());
                }
                return (T)new XmlSerializer(typeof(T)).Deserialize(xmlReader);
            }
        }

Update 20th March 2017:As the comment below points out, there is a memory leak problem when using one of the constructors of XmlSerializer, so I ended up using a caching solution as shown below:

    /// <summary>
    ///     Deserialize XML string, optionally only an inner fragment of the XML, as specified by the innerStartTag parameter.
    /// </summary>
    public static T DeserializeXml<T>(this string @this, string innerStartTag = null) {
        using (var stringReader = new StringReader(@this)) {
            using (var xmlReader = XmlReader.Create(stringReader)) {
                if (innerStartTag != null) {
                    xmlReader.ReadToDescendant(innerStartTag);
                    var xmlSerializer = CachingXmlSerializerFactory.Create(typeof (T), new XmlRootAttribute(innerStartTag));
                    return (T) xmlSerializer.Deserialize(xmlReader.ReadSubtree());
                }
                return (T) CachingXmlSerializerFactory.Create(typeof (T), new XmlRootAttribute("AutochartistAPI")).Deserialize(xmlReader);
            }
        }
    }
/// <summary>
///     A caching factory to avoid memory leaks in the XmlSerializer class.
/// See http://dotnetcodebox.blogspot.dk/2013/01/xmlserializer-class-may-result-in.html
/// </summary>
public static class CachingXmlSerializerFactory {
    private static readonly ConcurrentDictionary<string, XmlSerializer> Cache = new ConcurrentDictionary<string, XmlSerializer>();
    public static XmlSerializer Create(Type type, XmlRootAttribute root) {
        if (type == null) {
            throw new ArgumentNullException(nameof(type));
        }
        if (root == null) {
            throw new ArgumentNullException(nameof(root));
        }
        var key = string.Format(CultureInfo.InvariantCulture, "{0}:{1}", type, root.ElementName);
        return Cache.GetOrAdd(key, _ => new XmlSerializer(type, root));
    }
    public static XmlSerializer Create<T>(XmlRootAttribute root) {
        return Create(typeof (T), root);
    }
    public static XmlSerializer Create<T>() {
        return Create(typeof (T));
    }
    public static XmlSerializer Create<T>(string defaultNamespace) {
        return Create(typeof (T), defaultNamespace);
    }
    public static XmlSerializer Create(Type type) {
        return new XmlSerializer(type);
    }
    public static XmlSerializer Create(Type type, string defaultNamespace) {
        return new XmlSerializer(type, defaultNamespace);
    }
}

I was working on a similar problem and I found your question and [this blog post](https://blogs.msdn.microsoft.com/tess/2006/02/15/net-memory-leak-xmlserializing-your-way-to-a-memory-leak/) about a memory leak when using the constructor XmlSerializer(Type, XmlRootAttribute). You need to check your code. I think your method is creating a new temporary assembly every time it is called. You'll probably have to perform manual caching for each Type+innerStartTag combination. — plushpuffin, Mar 18 '17 at 04:19
Yes thank you for reminding me. I have updated my answer with a fix. — Stig Schmidt Nielsson, Mar 20 '17 at 11:41

score 3 · Answer 3 · answered Dec 15 '08 at 21:58

You can control how your serialization is done by implementing the ISerializable interface in your class. Note this will also imply a constructor with the method signature (SerializationInfo info, StreamingContext context) and sure you can do what you are asking with that.

However have a close look at whether or not you really need to do this with streaming because if you don't have to use the streaming mechanism, achieving the same thing with Linq to XML will be easier, and, simpler to maintain in the long term (IMO)

score 2 · Answer 4 · answered Dec 15 '08 at 22:57

I think the previous commenter is correct in his comment that XML might not be the best choice of a backing store here.

If you are having issues of scale and aren't taking advantage of some of the other niceties you get with XML, like transforms, you might be better off using a database for your data. The operations you are doing really seem to fit more into that model.

I know this doesn't really answer your question, but I thought I would highlight an alternate solution you might use. A good database and an appropriate OR mapper like .netTiers, NHibernate, or more recently LINQ to SQL / Entity Framework would probably get you back up and running with minimal changes to the rest of your codebase.

he's may be just a consumer on an esb.so he cannot change his datastore.reading parts of xmls is a legitimate process.with a low-level xmlreader it is possible to index a file/stream and seek/jump directly to any position in the document. — mo., Nov 10 '10 at 23:53

score 1 · Answer 5 · answered Jul 27 '09 at 18:42

1

Please try defining the SalesPerson property as type XmlElement. This works for output from ASMX web services, which use XML Serialization. I would think it would work on input as well. I would expect the entire <SalesPerson> element to wind up in the XmlElement.

answered Jul 27 '09 at 18:42

John Saunders

160,644
26
247
397

They may also need the XmlAnyAttribute on that member. – Steven Sudit Jul 31 '09 at 16:44
I may be mistaken, actually, since it looks like XmlAny is for a property that returns an *array* of XmlElements, not just one. – Steven Sudit Jul 31 '09 at 20:19
I just re-read the description more carefully, and it looks like XmlAnyElement and XmlAnyAttribute are for slicing. They're catch-alls for the stuff that the XSD doesn't find a place for. – Steven Sudit Jul 31 '09 at 20:21
I'm not talking about `XmlElementAttribute`. I'm talking about `System.Xml.XmlElement`. – John Saunders Aug 01 '09 at 00:20

score 1 · Answer 6 · answered Dec 15 '08 at 21:58

Typically XML deserialization is an all-or-nothing proposition out of the box, so you'll probably need to customize. If you don't do a full deserialization, you run the risk that the xml is malformed within the SalesPerson element, and so the document is invalid.

If you are willing to accept that risk, you'll probably want to do some basic text parsing to break out the SalesPerson elements into a different document using plain text processing facilities, then process the XML.

This is a good example of why XML is not always the correct answer.

score 0 · Answer 7 · answered Jul 27 '09 at 18:40

0

If all you want to do is parse out the SalesPerson element but keep it as a string, you should use Xsl Transform rather than "Deserialization". If, on the other hand, you want to parse out the SalesPerson element and only populate an object in memory from all the other non-SalesPerson elements, then Xsl Transform might also be the way to go. If the files are way big, you may consider separating them and using Xsl to combine different xml files so that the SalesPerson I/O only occurs when you need it to.

answered Jul 27 '09 at 18:40

devlord

4,054
4
37
55

The use case is that the Car data I want as objects so that my program can interact with it. The SalesPerson XML simply gets sent over the wire to another system, so I don't even need to inspect it. Basically, I need to get all the data, but only care about what the Car elements contain. – Mike Jul 29 '09 at 12:35
If that's the case, then all you have to do is not supply XmlElementAttributes to serialize the non-car data. – devlord Aug 06 '09 at 23:51

score 0 · Answer 8 · answered Aug 06 '09 at 10:39

I would suggest you to manually read from Xml, using any lightweight methods, like XmlReader, XPathDocument or LINQ-to-XML.

When you have to read only 3 properties, I suppose you can write code that manually read from that node and have a full control of how it is executed instead of relying on Serialization/Deserialization

score 0 · Answer 9 · answered Dec 15 '08 at 23:39

You may control what parts of the Cars class are deserialized by implementing the IXmlSerializable interface on the Cars class, and then within the ReadXml(XmlReader) method you would read and deserialize the Car elements but when you reach the SalesPerson element you would read its subtree as a string and then construct a Stream over the the textual content using a StreamWriter.

If you never want the XmlSerializer to write out the SalesPerson element, use the [XmlIgnore] attribute. I am not sure what you want to happen when you seriailize the Cars class to its XML representation. Are you trying to only prevent deserialization of the SalesPerson while still being able to serialize the XML representation of the SalesPerson represented by the Stream?

I could probably provide a code example of this if you want a concrete implementation.

How to deserialize only part of an XML document in C#

9 Answers9

Linked

Related