c# MemoryStream Encoding Vs. Encoding.GetChars()

Question

I am trying to copy a byte stream from a database, encode it and finally display it on a web page. However, I am noticing different behavior encoding the content in different ways (note: I am using the "Western European" encoding which has a Latin character set and does not support chinese characters):

var encoding = Encoding.GetEncoding(1252 /*Western European*/);
using (var fileStream = new StreamReader(new MemoryStream(content), encoding))
{
    var str = fileStream.ReadToEnd();
}

Vs.

var encoding = Encoding.GetEncoding(1252 /*Western European*/);
var str = new string(encoding.GetChars(content));

If the content contains Chinese characters than the first block of code will produce a string like "D$教学而设计的", which is incorrect because the encoding shouldn't support those characters, while the second block will produce "D$æ•™å¦è€Œè®¾è®¡çš„" which is correct as those are all in the Western European character set.

What is the explanation for this difference in behavior?

score 11 · Accepted Answer · edited Jun 09 '16 at 11:51

11

The StreamReader constructor will look for BOMs in the stream and set its encoding from them, even if you pass a different encoding.

It sees the UTF8 BOM in your data and correctly uses UTF8.

To prevent this behavior, pass false as the third parameter:

var fileStream = new StreamReader(new MemoryStream(content), encoding, false)

edited Jun 09 '16 at 11:51

RooiWillie

2,198
1
30
36

answered Nov 02 '12 at 13:59

SLaks

868,454
176
1,908
1,964

Thanks! now they produce the same string. Out of curiosity, which block of code do you suggest is better to use? Are there any advantages or disadvantages of either? – Sidawy Nov 02 '12 at 14:05

c# MemoryStream Encoding Vs. Encoding.GetChars()

1 Answers1