0

We have an app that collects data electronically and by user input. The data is eventually turned into XML. We have had problems with invalid XML characters being in the inbound data when we turn it into XML either by serializing objects or using a .Net Transform. The process will thrown an exception like the below.

Exception: System.Xml.XmlException: '', hexadecimal value 0x10, is an invalid character. Line 5, position 74.

I don't know any other way to fix this other than scrubbing all the data either at input time or at the time the XML is created. The thought of running every string input or string property in an object through a cleaning function doesn't sound appealing. Is that the way this would need to be resolved.

Looking for confirmation or alternatives.

Thanks, Kevin

  • 3
    Catch xml parser exceptions and only put those through your scrubbing routine; you don't need to clean everything. Also record which customers provide you with bad data so you can contact them and get the issues with their input resolved. The purpose of using formats such as XML is there's an agreed upon standard; if they're breaking that standard they're effectively breaking their side of a contract - so whilst it's good to have a flexible service, try to put the onus on them to resolve the data they give you. – JohnLBevan Sep 17 '14 at 23:03
  • So does your spec allow data to have these characters or not? Surely you're not changing the allowed data because you've just learned that you can't just insert arbitrary characters into an XML document. – Tom Blodget Sep 17 '14 at 23:33
  • See this [answer](http://stackoverflow.com/a/1647567/2226988) for XML serialization. – Tom Blodget Sep 18 '14 at 00:12

1 Answers1

0

There really isn't an elegant solution for this, but this response has some examples of whitelist cleansers.

Community
  • 1
  • 1
Brandon
  • 702
  • 7
  • 15