1

I'm sending a request with HttpClient to a remote endpoint. I want to download the content and save it to a file as an UTF-8 string.

If the server would respond with the proper Content-Type text/plain; charset=utf-8, then the following code processes it just fine:

HttpClient client = new();

HttpResponseMessage res = await client.GetAsync(url);
string text = await res.Content.ReadAsStringAsync();

File.WriteAllText("file.txt", text);

However, the server always returns the basic Content-Type text/plain and I'm unable to get that as an UTF-8 string.

HttpClient cl = new();

HttpResponseMessage res = await cl.GetAsync(url);

string attempt1 = await res.Content.ReadAsStringAsync();

string attempt2 = Encoding.UTF8.GetString(await res.Content.ReadAsByteArrayAsync());

Stream stream = await res.Content.ReadAsStreamAsync();
byte[] bytes = ((MemoryStream)stream).ToArray();
string attempt3 = Encoding.UTF8.GetString(bytes);

I tried all three of these approaches, all resulted in scrambled characters due to the encoding mismatch. I don't have control over the server, so I can't change the headers.

Is there any way to force HttpClient to parse it as UTF-8? Why are the manual approaches not working?

I've built a Cloudflare worker to demonstrate this behavior and allow you to easily debug: https://headers.briganreiz.workers.dev/charset-in-header
https://headers.briganreiz.workers.dev/no-charset

Edit: Turns out it was the GZip compression on the main server which I didn't notice. This question solved it for me: Decompressing GZip Stream from HTTPClient Response

1 Answers1

2

I find it works well with these different classes WebRequest and HttpWebResponse. I have not added plumbing for resp.StatusCode etc but obviously presuming all went well is a tad naive. Give it a go i am sure You'll find the WebRequest and HttpWebResponse more capable for dynamic requests (?)

var req = WebRequest.CreateHttp(url)
    
var getResponse = req.GetResponseAsync();
getResponse.Wait(ResponseTimeoutMilliseconds);

var resp = (HttpWebResponse)getResponse.Result;

using (Stream responseStream = resp.GetResponseStream())
{
     var reader = new StreamReader(responseStream, Encoding.UTF8);
     string content = reader.ReadToEnd();
}

Obviously once you have things working, you should absolutely use the ..Async versions but for debugging, since we already waited for response it is more convenient to simply step through i find, feel free to not take that middle step :)

T. Nielsen
  • 835
  • 5
  • 18