9

I am trying to copy a byte stream from a database, encode it and finally display it on a web page. However, I am noticing different behavior encoding the content in different ways (note: I am using the "Western European" encoding which has a Latin character set and does not support chinese characters):

var encoding = Encoding.GetEncoding(1252 /*Western European*/);
using (var fileStream = new StreamReader(new MemoryStream(content), encoding))
{
    var str = fileStream.ReadToEnd();
}

Vs.

var encoding = Encoding.GetEncoding(1252 /*Western European*/);
var str = new string(encoding.GetChars(content));

If the content contains Chinese characters than the first block of code will produce a string like "D$教学而设计的", which is incorrect because the encoding shouldn't support those characters, while the second block will produce "D$教学而设计的" which is correct as those are all in the Western European character set.

What is the explanation for this difference in behavior?

Sidawy
  • 574
  • 1
  • 3
  • 17

1 Answers1

11

The StreamReader constructor will look for BOMs in the stream and set its encoding from them, even if you pass a different encoding.

It sees the UTF8 BOM in your data and correctly uses UTF8.

To prevent this behavior, pass false as the third parameter:

var fileStream = new StreamReader(new MemoryStream(content), encoding, false)
RooiWillie
  • 2,198
  • 1
  • 30
  • 36
SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Thanks! now they produce the same string. Out of curiosity, which block of code do you suggest is better to use? Are there any advantages or disadvantages of either? – Sidawy Nov 02 '12 at 14:05