2

I have a strange problem using XML Serializer. If the serialized object graph contains a string with a Form-Feed (0x0C) the serializer can serialize it properly, but it cannot deserialize the serialized representation.

Here is the proof of concept:

static void Main (string[] args)
{
  var original = "test\fbla";

  var stringBuilder = new StringBuilder ();

  using (var writer = new StringWriter (stringBuilder))
  {
    new XmlSerializer (typeof (string)).Serialize (writer, original);
  }
  var serialized = stringBuilder.ToString ();


  string deserialized;
  using (var reader = new StringReader (serialized))
  {
    deserialized = (string) new XmlSerializer (typeof (string)).Deserialize (reader);
  }

  Console.WriteLine (deserialized);
}

The serialized string is:

<?xml version="1.0" encoding="utf-16"?>
<string>test&#xC;bla</string>

The call to Deserialize fails. It seems that this is a bug in XmlSerializer, since the serialized string seems to be well formed. Or am I doing something wrong?

Ondrej Tucny
  • 27,626
  • 6
  • 70
  • 90
Gerhard77
  • 21
  • 3

1 Answers1

2

That character is technically invalid in XML (a good question is why the writer doesn't throw this exception... looking at the reference source, it uses an XmlTextWriter instead of an XmlWriter, which I think by default doesn't check characters?). You need to give the serializer an XmlReader that's been told to not check characters:

string deserialized;
XmlReaderSettings settings = new XmlReaderSettings();
// this will make the reader not barf on invalid characters
settings.CheckCharacters = false;
// can't just use a string reader here, otherwise the Serializer
// will use an XmlReader with default settings
using (var reader = XmlReader.Create(new StringReader(serialized), settings)) 
{
    deserialized = (string)new XmlSerializer(typeof(string)).Deserialize(reader);
}

However - if you have a requirement to serialize strings that may/definitely contain characters that are invalid in XML, you should consider using a different serialization format (BinaryFormatter, JSON, or Protocol Buffers all come to mind depending on your requirements/consumers). There's no good way to guarantee that your downstream consumers will know they have to allow invalid characters in the XML, and some consumers may not have the option to do so.

Dan Field
  • 20,885
  • 5
  • 55
  • 71
  • Glad to help - if this resolved your issue consider marking the check box to help future users find it more easily. – Dan Field Mar 23 '16 at 20:54