using System;
namespace UnicodeRlm
{
class Program
{
static void Main(string[] args)
{
var uri = new Uri(
"https://example.com/attachments/The title is \"مفتاح معايير الويب!\" in Arabic.pdf");
Console.WriteLine(uri.AbsolutePath);
Console.WriteLine(uri.AbsolutePath.Length);
}
}
}
Under .NET 4.0, this produces
/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%E2%80%8F%22%20in%20Arabic.pdf
168
Under .NET 4.5+, this produces
/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%22%20in%20Arabic.pdf
159
.NET 4.5 drops the %E2%80%8F
part, which is the RLM character:
...!%E2%80%8F%22%20in%20Arabic.pdf
...!%22%20in%20Arabic.pdf
I have a hypothesis that this is caused by System.Uri escaping now supports RFC 3986, but my RFC-fu and Unicode-fu are failing me as to whether this RFC requires RLM to be dropped or wither this RLM character is placed correctly at all in the original string.
I'm not entirely sure whether this is the correct behavior standards-wise, but for me it's certainly not since I cannot download a file with an RLM character in the name in .NET 4.5 neither with WebClient
nor with HttpWebRequest
.
Is there any way to work around this quirk?