33

i'm creating a XDocument like this:

XDocument doc = new XDocument(
new XDeclaration("1.0", "utf-8", "yes"));

when i save the document like this (doc.Save(@"c:\tijd\file2.xml");) , i get this:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

which is ok.

but i want to return the content as xml, and i found the following code:

 var wr = new StringWriter(); 
            doc.Save(wr); 
            string s = (wr.GetStringBuilder().ToString());

this code works, but then the string 's' starts with this:

<?xml version="1.0" encoding="utf-16" standalone="yes"?>

so it changed from utf8 to utf16, and that's not what i want, because now i can't read it in internet explorer.

Is there a way to prevent this behaviour?

Michel
  • 23,085
  • 46
  • 152
  • 242
  • 2
    There's a Big Red Flag here, the string writer really does contain a utf-16 encoded string. Even if you override the Encoding property. How does this get from the StringWriter into IE? – Hans Passant Mar 09 '11 at 16:54
  • Good question. I save the string 's' to a file with File.WriteAllText and then open it with IE. Didn't specify that too clearly in my question... – Michel Mar 09 '11 at 18:41
  • 1
    Right, the File.WriteAllText() call is the one that *really* determines the encoding. Default is utf-8 unless you use the overload that takes an Encoding. – Hans Passant Mar 09 '11 at 18:47

3 Answers3

43

StringWriter advertises itself as using UTF-16. It's easy to fix though:

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding { get { return Encoding.UTF8; } }
}

That should be enough in your particular case. A rather more well-rounded implementation would:

  • Have constructors matching those in StringWriter
  • Allow the encoding to be specified in the constructor too
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    Ah ok, so the StringWriter makes it UTF-16. I always try to understand the encoding stuff, but it just doesn't seem to stick. Sounds logical that a UTF-16 object like the stringwriter creates a UTF-16 string, but what struck me was that it also changed the declaration in the XML file. Didn't think it was caused by the StringWriter, because i was always struggling with UTF-16 too when i was using the XmlDocument before the XDocument, so i thought it was just a .Net habit or something. So thanks for the answer! – Michel Mar 09 '11 at 18:39
  • 4
    @Michel: Basically the Save method *asks* the writer what encoding to use, so that it'll use whatever's appropriate. It's a bit of a mess, I agree... – Jon Skeet Mar 09 '11 at 18:44
  • 1
    I know this is an old thread, but for others that use this solution, remember when you new up the object you need to use: var wr = new Utf8StringWriter (); – SDanks Dec 18 '17 at 19:22
  • @SDanks: Just using `TextWriter wr = new Utf8StringWriter();` would be fine. It's not clear what you're trying to emphasize - there's nothing particularly odd about this. – Jon Skeet Dec 18 '17 at 22:15
3

Very good answer using inheritance, just remember to override the initializer

   public class Utf8StringWriter : StringWriter
    {
        public Utf8StringWriter(StringBuilder sb) : base (sb)
        {
        }
        public override Encoding Encoding { get { return Encoding.UTF8; } }
    }
Sebastian Castaldi
  • 8,580
  • 3
  • 32
  • 24
1

You will need to set the StreamWriter.Encoding to use UTF-8 instead of Unicode (UTF-16)

Seeing as it's not a StreamWriter this answer is only left for posterity.

msarchet
  • 15,104
  • 2
  • 43
  • 66
  • 1
    There's no StreamWriter involved here. There's only a StringWriter, and you can't programmatically set the encoding of that - you have to do it via inheritance :( – Jon Skeet Mar 09 '11 at 16:47
  • Wow yea, I totally misread what was being used. Inheritance it is. – msarchet Mar 09 '11 at 16:48