3

I have some biz objects to store the customer names, sometimes the name contains some special characters like , . These names are imported from 3rd party, and I cannot delete the funny chars from the source.

The application will serialize/deserialize the customer object by XmlSerializer, but the strange thing here is when I serialize the name with special chars, there are no errors, and the result will be like this <Name>Jim &#2;<Name>. But when I deserialize the output xml, I will get an exception There is an error in XML document (3, 15).

So how to handle these special characters in my application? Thanks!

Attached some test code:

    public class Customer
    {
        public string Name;
    }

    class Program
    {
        public static T DeserializeFromXml<T>(string settings) where T : class
        {
            var serializer = new XmlSerializer(typeof(T));
            var reader = new StringReader(settings);
            var result = serializer.Deserialize(reader);
            return result as T;
        }

        public static string SerializeToXml<T>(T settings)
        {
            var serializer = new XmlSerializer(typeof(T));
            var writer = new StringWriter();
            serializer.Serialize(writer, settings);
            return writer.ToString();
        }

        static void Main(string[] args)
        {
            var str = new char[] { 'J', 'i', 'm', (char)2 };
            var customer = new Customer { Name = new string(str) };

            var output = SerializeToXml(customer);

            var obj = DeserializeFromXml<Customer>(output);
        }
    }
leppie
  • 115,091
  • 17
  • 196
  • 297
Eric
  • 655
  • 2
  • 9
  • 16
  • Problem solved! I passed `XmlReaderSettings.CheckCharacters = false` to the XmlReader, then it ignored the special characters. – Eric Jul 01 '13 at 06:38

1 Answers1

4

I don't have a solution for your question, but here is the background info.

The string &#2; is XML for saying the character with value of '2'. According to XML 1.0 this is not a valid character. See http://www.w3.org/TR/2004/REC-xml-20040204/#NT-Char.

The .Net CLR is consistent. The Xml serialiser will happily generated XML documents with illegal character. However the deserialiser will throw when an illegal character is encountered.

See http://msdn.microsoft.com/en-us/library/aa302290.aspx for more details.

XML 1.1 relaxes the restriction. But .Net only support XML 1.0.

Richard Schneider
  • 34,944
  • 9
  • 57
  • 73