I'm sending a request with HttpClient
to a remote endpoint. I want to download the content and save it to a file as an UTF-8 string.
If the server would respond with the proper Content-Type
text/plain; charset=utf-8
, then the following code processes it just fine:
HttpClient client = new();
HttpResponseMessage res = await client.GetAsync(url);
string text = await res.Content.ReadAsStringAsync();
File.WriteAllText("file.txt", text);
However, the server always returns the basic Content-Type
text/plain
and I'm unable to get that as an UTF-8 string.
HttpClient cl = new();
HttpResponseMessage res = await cl.GetAsync(url);
string attempt1 = await res.Content.ReadAsStringAsync();
string attempt2 = Encoding.UTF8.GetString(await res.Content.ReadAsByteArrayAsync());
Stream stream = await res.Content.ReadAsStreamAsync();
byte[] bytes = ((MemoryStream)stream).ToArray();
string attempt3 = Encoding.UTF8.GetString(bytes);
I tried all three of these approaches, all resulted in scrambled characters due to the encoding mismatch. I don't have control over the server, so I can't change the headers.
Is there any way to force HttpClient to parse it as UTF-8? Why are the manual approaches not working?
I've built a Cloudflare worker to demonstrate this behavior and allow you to easily debug:
https://headers.briganreiz.workers.dev/charset-in-header
https://headers.briganreiz.workers.dev/no-charset
Edit: Turns out it was the GZip compression on the main server which I didn't notice. This question solved it for me: Decompressing GZip Stream from HTTPClient Response