3

I'm struggling with the usual conversion issue, but unfortunately I haven't been able to find anything for my specific problem.

My app is receiving a System.Net.Http.HttpResponseMessage, from a php server, UTF8 encoded, containing some characters like \u00c3\u00a0 (à) and I'm not able to convert them.

string message = await result.Content.ReadAsStringAsync();
byte[] messageBytes = Encoding.UTF8.GetBytes(message);
string newmessage = Encoding.UTF8.GetString(messageBytes, 0, messageBytes.Length);

This is just one of my try, but nothing happens, the resultring string still has the \u00c3\u00a0 characters.

I have also read some answers like How to convert a UTF-8 string into Unicode? but this solution doesn't work for me. This is the solution code:

public static string DecodeFromUtf8(this string utf8String)
{
   // copy the string as UTF-8 bytes.
   byte[] utf8Bytes = new byte[utf8String.Length];
   for (int i=0;i<utf8String.Length;++i) {
      //Debug.Assert( 0 <= utf8String[i] && utf8String[i] <= 255, "the char must be in byte's range");
      utf8Bytes[i] = (byte)utf8String[i];
   }

   return Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);
}

DecodeFromUtf8("d\u00C3\u00A9j\u00C3\u00A0"); // déjà

I have noticed that when I try the above solution with a simple string like

string str = "Comunit\u00c3\u00a0"

the DecodeFromUtf8 method works perfectly, the problem is when I use my response message.

Any advice would be very appreciated

Community
  • 1
  • 1
Aenima
  • 155
  • 1
  • 13

2 Answers2

5

I've solved this problem by myself. I've discovered that the server response was a ISO string of a utf-8 json, so I had to remove the json escape characters and then convert the iso into a utf8

So I had to do the following:

private async Task<string> ResponseMessageAsync(HttpResponseMessage result)
{
    string message = await result.Content.ReadAsStringAsync();
    string parsedString = Regex.Unescape(message);
    byte[] isoBites = Encoding.GetEncoding("ISO-8859-1").GetBytes(parsedString);
    return Encoding.UTF8.GetString(isoBites, 0, isoBites.Length);
 }
Aenima
  • 155
  • 1
  • 13
  • I had a similar issue. Vendor said the content was UTF-8 and the content-type header said the content type was UTF-8 but I had to use the "ISO-8859-1" encoding to read the response content I got the correct characters returned. – Andrew Hawes Dec 11 '19 at 13:31
  • What actually worked in my case was the reading of the iso bytes and getting the UTF8 string, without actually calling Encoding.Convert – syonip Jun 10 '20 at 13:45
  • That exactly what I was looking for, Thanks! – Luiz Vaz Aug 13 '20 at 17:29
2

for me works change from:

string message = await result.Content.ReadAsStringAsync();
byte[] messageBytes = Encoding.UTF8.GetBytes(message);
string newmessage = Encoding.UTF8.GetString(messageBytes, 0, messageBytes.Length);

to:

byte[] bytes = await result.Content.ReadAsByteArrayAsync();
Encoding utf8 = Encoding.UTF8;
string newmessage = utf8.GetString(bytes);
Kamil Z
  • 181
  • 12