0

I need to parse a complex and large (100 MB+) XML file. Fortunately I have XML Schema definitions, but unfortunately I can not use xsd2code to generate an automatic XML deserialization, because there are abstract message types used on the top level of the XML. The structure of the XML file is like this:

<Head>  
    <Batch>   
        <Dog></Dog>   
        <Dog></Dog>  
    </Batch>  
</Head>

The xsd defines batch to contain abstract animals, not dog. Xsd2Code can create the Dog class with the right XML attributes, but the dog class is inside another xsd file. I tried to paste all xsd together, but this did not help to fix this.
Is there a good way like Linq to XML or Xpath to loop over the elements in Batch and create Dog instances without needing to parse Dog manually?

weismat
  • 7,195
  • 3
  • 43
  • 58
  • I didn't quite understand your question. Could there be some other tags than `` in your XML? I understand that you don't want to parse the inner contents of the `` tag but you want it to be directly deserialized to an instance of a `Dog`, right? – Darin Dimitrov Jan 15 '13 at 06:40
  • There are potentially other animals inside the XML. For now I just want to read all dogs. From the xsd I can generate the dog class, but I know no generic way so far to parse it. – weismat Jan 15 '13 at 06:47
  • as per my understand (you want to make a XML file and want to pass it) ... for that..you just make a string" " which is right way..now convert or save as .xml extension...so, when will u use that it should be in XML format and i dot think so, it will be issue to pass it... – Jignesh.Raj Jan 15 '13 at 06:48
  • As stated before a xml file needs to be parsed/deserialized, not serialized/created. – weismat Jan 15 '13 at 06:49
  • @weismat Just for clarification, do you have class `Dog` with the same schema as xml? If so you can pass `XElement` in constructor of `dog`, `Cat` or whatever class and then use reflection to assign properties. But this will have big overhead. – Leri Jan 15 '13 at 06:59
  • I have a Dog class created with xsd2code with XML attributes. In my previous uses of xsd2code I was able to deserialize just by using the created classes and the the LoadFromFile method. This time this did not work. – weismat Jan 15 '13 at 07:01

1 Answers1

1

Is there a good way like Linq to XML or Xpath to loop over the elements in Batch and create Dog instances without needing to parse Dog manually?

It depends on what you mean by "manually". I've found it's useful to have a pattern where each relevant class has a static FromXElement factory method (or a constructor taking an XElement) which extracts the relevant details. With LINQ to XML that's pretty straightforward, e.g.

public static Dog FromXElement(XElement element)
{
    // Or whatever...
    return new Dog((string) element.Element("Name"),
                   (double) element.Element("Weight"));
}

Then you can use:

List<Dog> dogs = batch.Elements("Dog")
                      .Select(x => Dog.FromXElement(x))
                      .ToList();

(You may be able to use Select(Dog.FromXElement) instead - it depends on which version of C# you're using.)

To process all the animals in a batch, you'd probably want something like:

private static readonly Dictionary<string, Func<XElement, Animal>> Factories =
    new Dictionary<string, Func<XElement, Animal>>
{
    { "Dog", Dog.FromXElement },
    { "Cat", Cat.FromXElement },
    // etc
}
...
List<Animal> animals = batch.Elements()
                            .Select(x => Factories[x.Name.LocalName](x))
                            .ToList();
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • The described Factory method is what I would call manually. I was wondering if I would be able to create the Dog class automatically (as dog itself is rather complex as well and Dog contains the XML attributes after the code generation). It might work if there is for instance an easy way to extract the subtree of the large XML file into a new XML document. – weismat Jan 15 '13 at 06:54
  • @weismat: It's possible that there is - but personally I've never been entirely comfortable with that approach. I've usually found that it takes longer getting the generator to do what I want than to do it manually. Bear in mind that LINQ to XML makes the "manual" version considerably simpler than it was in previous versions. So the above is what I'd do, but of course YMMV. – Jon Skeet Jan 15 '13 at 07:01
  • Just Dog contains three substructures. The first structure alone has about 20 properties. It is more about writing easily maintainable code than the actual current need now to work this way. – weismat Jan 15 '13 at 07:06
  • @weismat: So the `Dog` factory method would just call `FromXElement` on each of the other three... you end up with a lot of code, but it's really *simple* code. But if you've got "dog" in a separate XSD file, what happens if you just try xsd2code on that, then call the generated code appropriately? – Jon Skeet Jan 15 '13 at 07:12
  • I have tried now all different things to change the original XML, but I might throw the towel now and write the suggested factory methods as I do not get the automatic deserialization to work. – weismat Jan 15 '13 at 11:21