0

I am reading a bunch of XML files into a list (IEnumerable really) of XElements. Then I want to convert the XElement list (these XElements contain a bunch of child-elements) into a list of classes, so I can do later operations with the data more easily.

Now if I know in advance the structure of XElements, this would be easy; I'd just create a class that mimics the XElement structure and fill instances of it with the XElement contents. But here's the caveat; my XML file element structure is mostly similar, but there could be the odd element that has a different structure. To better illustrate the situation let me take an example.

Let's say my XML files contain a bunch of 'Person' elements. The Person elements has some common elements that will be in ALL the elements, but there are some children of Person which can be found only in some of the elements.

For example all Person elements have these mandatory children:

  <Person>
    <Name/>
    <Age/>
    <City/>
    <Country/>
  </Person>

But, some Person elements may contain additional children as follows:

  <Person>
    <Name/>
    <Age/>
    <City/>
    <Country/>
    <EyeColor/>
    <Profession/>
  </Person>

To make things worse, these child elements can also have mostly similar structure that occasionally varies.

So is there a way that I can go through these XElements in just one loop, and put them into an instance that is somehow dynamically created, say, based on the element names or something similar? I could create a class with all the mandatory elements and leave few additional member variables for the odd new ones, but that's not ideal for two reasons; one, it would be a waste of space, and two, there could be more child element than I have extra variables in my class.

So I'm looking for a way to create the class instances dynamically to fit the XElement structure. In other words I'd really like to mimic the element structure right down to the deepest level.

Thanks in advance!

Sach
  • 10,091
  • 8
  • 47
  • 84
  • Typically the reason for XML is the opposite of what is going on. It is to provide a medium for serialization that can be validated typically was an XSD. If something could just change on the fly from one thing to the next that would be horrid for data validation. I would instead of going the route of what it may not have, have a route where you get everything possible it could be and then make a class serialize. – djangojazz Jan 30 '17 at 23:01
  • I totally agree with you here. But the problem is I'm dealing with a system of which I'm not allowed to change anything. In other words, I can't do anything about the structure of the XML files. If I designed it, I would do it in a better way, but alas, it's not to be. So I have no option but to deal with it as it is, and find a way around the problem. – Sach Jan 30 '17 at 23:07
  • I am not saying to change the structure, I am saying can who made it give you and XSD or some type of validation file? I would go the route if say you had 3 properties and they may have the potential for eight, just make a class for eight properties. I'll give an example in an answer. – djangojazz Jan 30 '17 at 23:14
  • No, that too is out of my control. I can't get a validation file, and also I don't know in advance the potential; it could be 8 or it could be 80. And to make it worse even the structure of the elements are subject to change in the future, though only occasionally. – Sach Jan 30 '17 at 23:20

3 Answers3

1

I think the best route personally would be to get an XSD, if you cannot get that then make up a serializable class that has all the possibilities and then reference that. EG: You have two fields where one get's set sometimes and one you have never seen set but there is the potential in a spec somewhere it may happen.

So let's make up a pretend class:

using System;
using System.Collections.Generic;
using System.Xml.Serialization;

namespace GenericTesting.Models
{
  [Serializable()]
  public class Location
  {                                                                                
    [XmlAttribute()]
    public int Id { get; set; }
    [XmlAttribute()]
    public double PercentUsed { get; set; }
    [XmlElement]
    public string ExtraGarbage { get; set; }
    [XmlText]
    public string UsedOnceInTheUniverse { get; set; }
  }
}

And for the purpose of serializing/deserializing let me give extension methods for those:

using System.IO;        
using System.Xml;
using System.Xml.Serialization;

namespace GenericTesting
{                                   
  static class ExtensionHelper
  { 
    public static string SerializeToXml<T>(this T valueToSerialize)
    {
      dynamic ns = new XmlSerializerNamespaces();
      ns.Add("", "");
      StringWriter sw = new StringWriter();

      using (XmlWriter writer = XmlWriter.Create(sw, new XmlWriterSettings { OmitXmlDeclaration = true }))
      {
        dynamic xmler = new XmlSerializer(valueToSerialize.GetType());
        xmler.Serialize(writer, valueToSerialize, ns);
      }

      return sw.ToString();
    }

    public static T DeserializeXml<T>(this string xmlToDeserialize)
    {
      dynamic serializer = new XmlSerializer(typeof(T));

      using (TextReader reader = new StringReader(xmlToDeserialize))
      {
        return (T)serializer.Deserialize(reader);
      }
    }
  }
}

And a simple main entry point in a console app:

static void Main(string[] args)
{
  var locations = new List<Location>
    {
      new Location { Id = 1, PercentUsed = 0.5, ExtraGarbage = "really important I'm sure"},
      new Location { Id = 2, PercentUsed = 0.6},
      new Location { Id = 3, PercentUsed = 0.7},
    };

  var serialized = locations.SerializeToXml();

  var deserialized = serialized.DeserializeXml<List<Location>>();

  Console.ReadLine();
}

I know this is not exactly what you are asking for but I personally think well typed is better for XML and any third party you ever deal with should have at the very least some type of spec sheet or details on what they are giving you. Else you are losing standards. Xml should not be created from reflection or other means dynamically as it is meant if anything to enforce strict typing if anything.

djangojazz
  • 14,131
  • 10
  • 56
  • 94
  • I'm not sure if I understand you completely here. My requirement is to fill data into something similar to the 'Location' class, from the XElement data which I read from XML files. But from your main() what I gather is you have your data in Location class already, and then you serialize/deserialize it? What am I missing? – Sach Jan 30 '17 at 23:57
  • If you are just looking to just echo out the structure than Alexander gave a perfect answer. However you wrote this: '...I want to convert the XElement list (these XElements contain a bunch of child-elements) into a list of classes, so I can do later operations with the data more easily.' That is indicative of serializing deserializing WELL FORMED objects. All I am suggesting in the simplest way possible is to make a class for your object that you are parsing. Adorn it with the attributes 'serializable' and then make a method to deserialize it. – djangojazz Jan 31 '17 at 15:39
  • I in my example gave a class I already had from another example and was too lazy to use your structure. But the 'Serialize/Deserialize' methods I use all the time in production code and is a good medium to talk to the well formed classes. All I am suggesting is you don't need to do querying over XElements to then cast them into objects. The object can do it in and of itself through serialization deserialization. Else you are going to have to update the POCO class AND the querying class, with my method you just update the POCO class and adornments. – djangojazz Jan 31 '17 at 15:42
  • I see what you mean now, but this is the reason for my question: "But here's the caveat; my XML file element structure is mostly similar, but there could be the odd element that has a different structure." i.e., I can NOT define a class like Person (or in your case Location), because I don't know in advance what the member variables of it would be. It has some such as Name, Age that are common to all my Person elements, but there are others that are found only in some. And I don't know how many potential such elements are there; it could be 8 or it could be 80. – Sach Jan 31 '17 at 17:11
  • Right and my suggestion would be to make a class of the 80 properties, not just the 4 you use. In my example if you run it, if I do not use a property it has no detrimental effect. Versus if you keep adding as you go you need to update your class and your retrieval pattern. I mean you are seeking a method to update things from how they are received but even so if you do that how do you stick to strict types then? Are you just going to hope to make everything strings? – djangojazz Jan 31 '17 at 19:26
  • Yes, at the end of the day this is 'sort of' what I settled on. Like I said I don't know whether there are 8 or 80, and even that 80 is a number I just pulled out of my, er.., let's say hat. I ended up crawling through ALL the XML files that are associated with this system and made one master XML file with one Person element template which contains all the known children elements of Person known so far. The system reads from it to figure out the Person template. Then whenever I get a new set of XML files, I'll add to the template if there are new XML elements. Thanks! – Sach Feb 01 '17 at 19:23
  • No prob, happy coding. – djangojazz Feb 01 '17 at 19:45
1

if you want to just enumerate over any child element of <Person> and xml is relatively small you could use linq to xml

var listOfElementChildNames = XDocument.Parse(xml).Element("Person")
                                                  .Elements()
                                                  .Select(e => e.Name)
                                                  .ToList();

Edit:

instead of select .Select(e => e.Name) we could map to any class:

public class Person
{
    public string Name {get;set;}
    public int Age {get;set;}
    public string City {get;set;}
}

var xml = @"<Person>
        <Name>John</Name>
        <Age>25</Age>
        <City>New York</City>
      </Person>";

var people = XDocument.Parse(xml).Elements("Person")
     .Select(p => new Person 
        { 
          Name = p.Element("Name").Value, 
          Age = int.Parse(p.Element("Age").Value),
          City = p.Element("City").Value 
        }).ToList();

Mapping result

Alexander V.
  • 1,518
  • 14
  • 14
  • I've got the getting XML elements parts covered. My requirement is to convert them into a list of classes which closely mimics the structure of the Person element. – Sach Jan 30 '17 at 23:47
  • I think we're going around in circles here, probably a misunderstanding. I could use this solution only if I knew in advance my child elements of Person element. But I don't, which is the reason for this question. I know "some" of the child elements that all Person elements have, but some may contain extra children that I won't know until I read the files in runtime. So my requirement is to do what you've just done, but for elements that has unknowns structure at runtime. – Sach Jan 31 '17 at 16:34
1

Let me first apologize for the VB, but that is what I do.

If I understand what you are wanting you could use a Dictionary. I shortened your example to have fewer mandatory items, but hopefully you get the idea. Here is the person class that simply iterates the children adding them to the dictionary by their element name.

Public Class Person

    Private _dict As New Dictionary(Of String, XElement)
    Public Sub New(persEL As XElement)
        'if the class is intended to modify the original XML
        'use this declaration. 
        Dim aPers As XElement = persEL
        'if the original XML will go away during the class lifetime
        'use this declaration. 
        'Dim aPers As XElement =New XElement( persEL)

        For Each el As XElement In aPers.Elements
            Me._dict.Add(el.Name.LocalName, el)
        Next
    End Sub

    'mandatory children are done like this
    Public Property Name() As String
        Get
            Return Me._dict("Name").Value
        End Get
        Set(ByVal value As String)
            Me._dict("Name").Value = value
        End Set
    End Property

    Public Property Age() As Integer
        Get
            Return CInt(Me._dict("Age").Value)
        End Get
        Set(ByVal value As Integer)
            Me._dict("Age").Value = value.ToString
        End Set
    End Property
    'end mandatory children

    Public Property OtherChildren(key As String) As String
        Get
            Return Me._dict(key).Value
        End Get
        Set(ByVal value As String)
            Me._dict(key).Value = value
        End Set
    End Property

    Public Function HasChild(key As String) As Boolean
        Return Me._dict.ContainsKey(key)
    End Function
End Class

Here is a simple test to see how it works

    Dim onePersXE As XElement = <Person>
                                    <Name>C</Name>
                                    <Age>22</Age>
                                    <Opt1>optional C1</Opt1>
                                    <Opt2>optional C2</Opt2>
                                </Person>

    Dim onePers As New Person(onePersXE)
    onePers.Name = "new name"
    onePers.Age = 42
    onePers.OtherChildren("Opt1") = "new opt1 value"
    onePers.OtherChildren("Opt2") = "opt 2 has new value"

As you can see there are two mandatory elements and in this case two optional children.

Here is another example to show how persons might work

    Dim persons As XElement
    persons = <persons>
                  <Person>
                      <Name>A</Name>
                      <Age>32</Age>
                  </Person>
                  <Person>
                      <Name>B</Name>
                      <Age>42</Age>
                      <Opt1>optional B1</Opt1>
                      <Opt2>optional B2</Opt2>
                  </Person>
              </persons>


    Dim persList As New List(Of Person)
    For Each el As XElement In persons.Elements
        persList.Add(New Person(el))
    Next

Hope this at least gives you some ideas.

dbasnett
  • 11,334
  • 2
  • 25
  • 33
  • This is not entirely a bad idea, and could be very useful in some cases. Unfortunately it doesn't help with one of my major requirements which is preserving the structure of XML in a class/dictionary or some such format. Thanks though! – Sach Feb 01 '17 at 19:19
  • @Sach - if you use the class as is you will be manipulating the XML directly, without modifying the structure itself.. Take a look at the constructor. – dbasnett Feb 02 '17 at 01:59