3

I've already read some posts and articles on how to deserialize xml but still haven't figured out the way I should write the code to match my needs, so.. I'm apologizing for another question about deserializing xml ))

I have a large (50 MB) xml file which I need to deserialize. I use xsd.exe to get xsd schema of the document and than autogenerate c# classes file which I put into my project. I want to get some (not all) data from this xml file and put it into my sql database.

Here is the hierarchy of the file (simplified, xsd is very large):

public class yml_catalog 
{
    public yml_catalogShop[] shop { /*realization*/ }
}

public class yml_catalogShop
{
    public yml_catalogShopOffersOffer[][] offers { /*realization*/ }
}

public class yml_catalogShopOffersOffer
{
    // here goes all the data (properties) I want to obtain ))
}

And here is my code:

first approach:

yml_catalogShopOffersOffer catalog;
var serializer = new XmlSerializer(typeof(yml_catalogShopOffersOffer));
var reader = new StreamReader(@"C:\div_kid.xml");
catalog = (yml_catalogShopOffersOffer) serializer.Deserialize(reader);//exception occures
reader.Close();

I get InvalidOperationException: There is an error in the XML(3,2) document

second approach:

XmlSerializer ser = new XmlSerializer(typeof(yml_catalogShopOffersOffer));
yml_catalogShopOffersOffer result;
using (XmlReader reader = XmlReader.Create(@"C:\div_kid.xml"))          
{
    result = (yml_catalogShopOffersOffer)ser.Deserialize(reader); // exception occures
}

InvalidOperationException: There is an error in the XML(0,0) document

third: I tried to deserialize the entire file:

 XmlSerializer ser = new XmlSerializer(typeof(yml_catalog)); // exception occures
 yml_catalog result;
 using (XmlReader reader = XmlReader.Create(@"C:\div_kid.xml"))          
 {
     result = (yml_catalog)ser.Deserialize(reader);
 }

And I get the following:

error CS0030: The convertion of type "yml_catalogShopOffersOffer[]" into "yml_catalogShopOffersOffer" is not possible.

error CS0029: The implicit convertion of type "yml_catalogShopOffersOffer" into "yml_catalogShopOffersOffer[]" is not possible.

So, how to fix (or overwrite) the code to not get the exceptions?

edits: Also when I write:

XDocument doc = XDocument.Parse(@"C:\div_kid.xml");

The XmlException occures: unpermitted data on root level, string 1, position 1.

Here is the first string of the xml file:

<?xml version="1.0" encoding="windows-1251"?>

edits 2: The xml file short example:

<?xml version="1.0" encoding="windows-1251"?>
<!DOCTYPE yml_catalog SYSTEM "shops.dtd">
<yml_catalog date="2012-11-01 23:29">
<shop>
   <name>OZON.ru</name>
   <company>?????? "???????????????? ??????????????"</company>
   <url>http://www.ozon.ru/</url>
   <currencies>
     <currency id="RUR" rate="1" />
   </currencies>
   <categories>
      <category id=""1126233>base category</category>
      <category id="1127479" parentId="1126233">bla bla bla</category>
      // here goes all the categories
   </categories>
   <offers>
      <offer>
         <price></price>
         <picture></picture>
      </offer>
      // other offers
   </offers>
</shop>
</yml_catalog>

P.S. I've already acccepted the answer (it's perfect). But now I need to find "base category" for each Offer using categoryId. The data is hierarchical and the base category is the category that has no "parentId" attribute. So, I wrote a recursive method to find the "base category", but it never finishes. Seems like the algorythm is not very fast))
Here is my code: (in the main() method)

var doc = XDocument.Load(@"C:\div_kid.xml");
var offers = doc.Descendants("shop").Elements("offers").Elements("offer");
foreach (var offer in offers.Take(2))
        {
            var category = GetCategory(categoryId, doc);
            // here goes other code
        }

Helper method:

public static string GetCategory(int categoryId, XDocument document)
    {
        var tempId = categoryId;
            var categories = document.Descendants("shop").Elements("categories").Elements("category");
            foreach (var category in categories)
            {
                if (category.Attribute("id").ToString() == categoryId.ToString())
                {
                    if (category.Attributes().Count() == 1)
                    {
                        return category.ToString();
                    }
                    tempId = Convert.ToInt32(category.Attribute("parentId"));
                }
            }
        return GetCategory(tempId, document);
    }

Can I use recursion in such situation? If not, how else can I find the "base category"?

Aleksei Chepovoi
  • 3,915
  • 8
  • 39
  • 77
  • Could you give a small sample of what the XML schema looks like by showing us some example data and how you'd expect your objects to get from it? (p.s., you need to `Load()` a file, not `Parse()` it) – Jeff Mercado Jan 27 '13 at 09:30

1 Answers1

7

Give LINQ to XML a try. XElement result = XElement.Load(@"C:\div_kid.xml");

Querying in LINQ is brilliant but admittedly a little weird at the start. You select nodes from the Document in a SQL like syntax, or using lambda expressions. Then create anonymous objects (or use existing classes) containing the data you are interested in.

Best is to see it in action.

Based on your sample XML and code, here's a specific example:

var element = XElement.Load(@"C:\div_kid.xml");
var shopsQuery =
    from shop in element.Descendants("shop")
    select new
    {
        Name = (string) shop.Descendants("name").FirstOrDefault(),
        Company = (string) shop.Descendants("company").FirstOrDefault(),
        Categories = 
            from category in shop.Descendants("category")
            select new {
                Id = category.Attribute("id").Value,
                Parent = category.Attribute("parentId").Value,
                Name = category.Value
            },
        Offers =
            from offer in shop.Descendants("offer")
            select new { 
                Price = (string) offer.Descendants("price").FirstOrDefault(),
                Picture = (string) offer.Descendants("picture").FirstOrDefault()
            }

    };

foreach (var shop in shopsQuery){
    Console.WriteLine(shop.Name);
    Console.WriteLine(shop.Company);
    foreach (var category in shop.Categories)
    {
        Console.WriteLine(category.Name);
        Console.WriteLine(category.Id);
    }
    foreach (var offer in shop.Offers)
    {
        Console.WriteLine(offer.Price);
        Console.WriteLine(offer.Picture);
    }
}  

As an extra: Here's how to deserialize the tree of categories from the flat category elements. You need a proper class to house them, for the list of Children must have a type:

class Category
{
    public int Id { get; set; }
    public int? ParentId { get; set; }
    public List<Category> Children { get; set; }
    public IEnumerable<Category> Descendants {
        get
        {
            return (from child in Children
                    select child.Descendants).SelectMany(x => x).
                    Concat(new Category[] { this });
        }
    }
}

To create a list containing all distinct categories in the document:

var categories = (from category in element.Descendants("category")
                    orderby int.Parse( category.Attribute("id").Value )
                    select new Category()
                    {
                        Id = int.Parse(category.Attribute("id").Value),
                        ParentId = category.Attribute("parentId") == null ?
                            null as int? : int.Parse(category.Attribute("parentId").Value),
                        Children = new List<Category>()
                    }).Distinct().ToList();

Then organize them into a tree (Heavily borrowed from flat list to hierarchy):

var lookup = categories.ToLookup(cat => cat.ParentId);
foreach (var category in categories)
{
    category.Children = lookup[category.Id].ToList();
}
var rootCategories = lookup[null].ToList();

To find the root which contains theCategory:

var root = (from cat in rootCategories
            where cat.Descendants.Contains(theCategory)
            select cat).FirstOrDefault();
Community
  • 1
  • 1
flup
  • 26,937
  • 7
  • 52
  • 74
  • the loading works! Console.WriteLine(result.Name); displays "yml_catalog". But how can I get the data that is on the next levels?? Can You provide some code please? – Aleksei Chepovoi Jan 27 '13 at 08:13
  • I have added links to some samples. But the simplest answer to that is they are accessible through result.Descendants("elementname") – flup Jan 27 '13 at 09:33
  • also I recommend to look through Linq to Xml C# code samples - there is all the information) – Aleksei Chepovoi Jan 27 '13 at 11:15
  • I've added a specific example based on the XML document you provided. – flup Jan 27 '13 at 12:00
  • thanks for additional code, it works great! Can You look through the edits I made at the end of my question post? (I also need to find the "base category" using "categoryId" and I use recursion that never finishes to execute). Thanks in advance! – Aleksei Chepovoi Jan 27 '13 at 13:26
  • I had to look around for that, and I found http://stackoverflow.com/questions/4694227/flat-list-to-hierarchy . All parent categories must be mentioned in the xml document for this to work, though. – flup Jan 27 '13 at 17:38