1

I am parsing an XML document that is in UTF-8 format as follows:

XDocument doc = GetXmlFeed(url);
            doc.Declaration = new XDeclaration("1.0", "utf-8", "true");
            var root = doc.Root;

            if (year == highestYear)
                data = new TourDetails()
                {
                    TourName = root.Element("tourName").Value,
                    DetailedItenerary = (from a in root.Element("detailedItinerary").Descendants("detailedItineraryItem")
                                         select new IteneraryItem()
                                         {
                                             Label = a.Attribute("label").Value,
                                             Contents = a.Value
                                         }).ToList()
                };

The contents of the DetailedItinerary is in UTF-8 format. But, when we save it in the database we get weird characters like ’ and others.

How do I get Contents to be used understood to be in UTF-8 format. I assume the part that is not using UTF-8 is the a.value in the xml linq portion.

Our MySQL DB is set to use UTF-8 by default and in all the databases we're using.

Does anyone know how to fix it? Thanks!

rksprst
  • 6,471
  • 18
  • 54
  • 81

2 Answers2

3

Ok, I seem to have fixed this issue by using:

      Contents = System.Text.Encoding.UTF8.GetString(System.Text.Encoding.Default.GetBytes(a.Value))
rksprst
  • 6,471
  • 18
  • 54
  • 81
1

You are right that "a.value" is not using UTF because as soon as XML is in memory it is represented as regular C# strings (no UTF8 encoding). So beeing right will not help you.

You need to look very carefully what is stored and what is actually read. If it is binary field - try to get byte array first and check out what is there - UTF8 may start with UTF BOM and than text should follow. Check if BOM is wrong or if characters are represented as 2 bytes instead of one.

If it is text field - you may not be able to force UTF8 and should use other encoding that matches your fields's endoing at save time.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179