4

I have a C# Project in Visual studio which download and parse XML file that contains Korean, Chinese and another unicode characters. For example for korean artist named Taeyang it produce XML like this :

<name>태양</name>

but it returns

<name>??</name>

I have tried StreamReader Encoding.Default but result is

<name>태양</name>

The code:

string address = String.Format("http://musicbrainz.org/ws/2/artist/{0}?inc=url-rels", mbids[ord]);
HttpWebRequest newRequest = WebRequest.Create(address) as HttpWebRequest;
               newRequest.Headers["If-None-Match"] = etagProf;
               newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
var response = newRequest.GetResponse();
// Reader
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream, Encoding.UTF-8);
string data = reader.ReadToEnd();

and the xml source:

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
    <artist type="Person" id="d84e5667-3cbe-4556-b551-9d7e4be95d71">   
        <name>태양</name>
        <sort-name>Taeyang</sort-name><gender>Male</gender>
        <country>KR</country>
        ...........
    </artist>
</metadata>

I'm confused, why it happens ? Any idea dude ?

Mike
  • 1,231
  • 12
  • 17

3 Answers3

6

using the code below (notice I comment out 2 of your lines)

//newRequest.Headers["If-None-Match"] = "d84e5667-3cbe-4556-b551-9d7e4be95d71";
//newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";

and changed your line: StreamReader(stream, Encoding.UTF-8);

to : StreamReader(stream, Encoding.UTF8);

I got a good result characters wise: enter image description here

string address = String.Format("http://musicbrainz.org/ws/2/artist/{0}?inc=url-rels","d84e5667-3cbe-4556-b551-9d7e4be95d71");
HttpWebRequest newRequest = WebRequest.Create(address) as HttpWebRequest;
//newRequest.Headers["If-None-Match"] = "d84e5667-3cbe-4556-b551-9d7e4be95d71";
//newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
var response = newRequest.GetResponse();
// Reader
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream, Encoding.UTF8);
string data = reader.ReadToEnd();
MessageBox.Show(data);
Sagiv b.g
  • 30,379
  • 9
  • 68
  • 99
  • I just did it and output 태양 again – Mike Feb 20 '15 at 08:13
  • 1
    @Sag1v: `GetEncoding` is not relied on to determine the *real* text encoding – chouaib Feb 20 '15 at 08:15
  • 1
    @Sag1v - came to the same result. The code runs fine on my machine, too. So if this doesn't solve the problem of Michael Antonio, maybe his OS has some problems handling the UTF-8 code. – netblognet Feb 20 '15 at 09:10
  • 1
    @netblognet could be, but I think the problem resist in the 2 lines I commented out newRequest.Headers["If-None-Match"] = "d84e5667-3cbe-4556-b551-9d7e4be95d71"; newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip"; – Sagiv b.g Feb 20 '15 at 09:12
  • hi all, i fixed this problem. please look at my answer. thanks for your time ... nice to share :) – Mike Feb 20 '15 at 10:06
0

try UTF8 Encoding

StreamReader sr= new StreamReader(file_name, System.Text.Encoding.UTF8);

chouaib
  • 2,763
  • 5
  • 20
  • 35
0

I found that Console.WriteLine() can't output unicode clearly. Everything unicode (e.g. Korean, Chinese) and all characters except a-z and 0-9 can't output as expected cause Console.WriteLine() use single font Raster Font

But the main problem was about my DB CONNECTION, i forget to add charset=utf-8 in my connection string

nathanchere
  • 8,008
  • 15
  • 65
  • 86
Mike
  • 1,231
  • 12
  • 17