1

I have an XML document that has been created using utf-8 encoding. I want to store that document in a sql 2008 xml column but I understand I need to convert it to utf-16 in order to do that.

I've tried using XDocument to do this but I'm not getting a valid XML result after the conversion. Here is what I've tried to do the conversion on (Utf8StringWriter is a small class that inherits from StringWriter and overloads Encoding):

XDocument xDoc = XDocument.Parse(utf8Xml);
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() 
                { Encoding = writer.Encoding, Indent = true });

xDoc.WriteTo(xml);

string utf16Xml = writer.ToString();

The data in the utf16Xml is invalid and when trying to insert into the database I get the error:

{"XML parsing: line 1, character 38, unable to switch the encoding"}

However the initial utf8Xml data is definitely valid and contains all the info I need.

UPDATE: The initial XML is obtained by using XMLSerializer (with an Utf8StringWriter class) to create the xml string from an existing object model (engine). The code for this is:

public static void Serialise<T>(T engine, ref StringWriter writer)
{
    XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() { Encoding = writer.Encoding });

    XmlSerializer xs = new XmlSerializer(engine.GetType());

    xs.Serialize(xml, engine);
}

I have to leave this like this as that code is out of my control to change.

Before I even send the utf16Xml string to the failing database call I can view it via the Visual Studio debugger and I notice that the entire string is not present and instead I get a string literal was not closed error on the XML viewer.

dreza
  • 3,605
  • 7
  • 44
  • 55

3 Answers3

2

Set the encoding of the document to UTF-16 after you have parsed it from utf8xml

XDocument xDoc = XDocument.Parse(utf8Xml);
xDoc.Declaration.Encoding = "utf-16";
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() 
                { Encoding = writer.Encoding, Indent = true });

xDoc.WriteTo(xml);

string utf16Xml = writer.ToString();
Faraday
  • 2,904
  • 3
  • 23
  • 46
  • I just noticed I had the wrong string writer specified in my example. I meant to only using StringWriter as I want the XML in utf-16 not utf-8. Updated my question. – dreza Jun 05 '12 at 01:40
  • @dreza this line "xDoc.Declaration.Encoding = "utf-16";" should do the trcik for you then :) – Faraday Jun 05 '12 at 01:53
2

The error is on first line XDocument xDoc = XDocument.Parse(utf8Xml);. Most likely you converted utf8 stream into a string (utf8xml), but encoding specified in the string is still utf-8, so XML reader fails. If it is true than load XML directly from stream using Load instead of converting it to string first.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • thanks for the comment. I actually get given the string from another method that used XMLSerializer to create the XML in the first place so I don't have access to the stream itself. – dreza Jun 05 '12 at 02:03
  • So look at first characters - there is likely "encoding=....", if it is present or set to something different that UTF-16 here is your problem. I'd try to use XmlDocument.LoadXml in this case ... – Alexei Levenkov Jun 05 '12 at 02:08
0

Here's what I had to do to make it work. This just converts the XML to utf-16

string getUtf16Xml(System.Xml.XmlDocument xmlDoc)
{    
   System.Xml.Linq.XDocument xDoc = System.Xml.Linq.XDocument.Parse(xmlDoc.OuterXml);
   xDoc.Declaration.Encoding = "utf-16";

   return xDoc.ToString();    
}

Then I can save the results to the DB.

Manny
  • 1,034
  • 1
  • 11
  • 16