3

I'm using some PInvoke calls (specifically GetWindowText()) and occasionally I get a string with invalid Unicode code points in it (probably due to bugs in the program whose window I'm looking at). When I later try to write that string to XML, I get an exception. So I'd like to check for these invalid characters beforehand and, if possible, remove them from the string.

Unfortunately I cannot find anything in .NET's default functions that would allow me to do that. Did I miss something?

Vilx-
  • 104,512
  • 87
  • 279
  • 422
  • It looks like the validation rules specified [here](http://unicode.org/faq/utf_bom.html#utf16-7) for Unicode are different from the allowable characters in XML in [this](http://en.wikipedia.org/wiki/Valid_characters_in_XML) Wikipedia article. A brief search didn't turn up any RegEx expressions to validate Unicode v. XML. Have you looked at [XmlConvert.IsXmlChar](http://msdn.microsoft.com/en-us/library/system.xml.xmlconvert.isxmlchar.aspx) and [XmlConvert.VerifyXmlChars](http://msdn.microsoft.com/en-us/library/system.xml.xmlconvert.verifyxmlchars.aspx)? – HABO Aug 14 '12 at 18:57
  • Call String.Normalize(), that's likely to trigger the exception. – Hans Passant Aug 14 '12 at 20:59
  • @HansPassant - Yes, it does. I just wondered if there was something that doesn't throw an exception. And also I'd like to clean up the string and salvage as much as possible. The result doesn't need to be machine-accurate. It will be for humans to read, so imperfections are acceptable. – Vilx- Aug 14 '12 at 21:22

0 Answers0