0

I have a hard time trying to customize the output of HttpClient and JsonSerializer when I send (or serialize) object with special characters. I get different size characters in the output.

Test case if very simple:

var test = new Test
{
   A = "Zażółć gęślą jaźń"
};
var jsonString = JsonSerializer.Serialize(test);

public class Test
{
   [JsonPropertyName("a")]
   public string A { get; set; }
}

and PHP:

$data = array('a'=>'Zażółć gęślą jaźń');
$data = json_encode($data);
echo $data;

C# is returning:

{"a":"Za\u017C\u00F3\u0142\u0107 g\u0119\u015Bl\u0105 ja\u017A\u0144"}

PHP:

{"a":"Za\u017c\u00f3\u0142\u0107 g\u0119\u015bl\u0105 ja\u017a\u0144"}

In C# I get \u017C, but in PHP I get \u017c (capital C vs lower c).

I need to have the same output because I need to calculate checksum and hash from my request body.

I'm not even sure how to search for such an option and if this is possible to do (easily). I've tried using JsonSerializerOptions but without success.

EDIT
I've just done a quick test with Newtonsoft and the output is as desired:

string newtonsoftOutput = JsonConvert.SerializeObject(test , new JsonSerializerSettings
{
    StringEscapeHandling = StringEscapeHandling.EscapeNonAscii,
    ContractResolver = new CamelCasePropertyNamesContractResolver()
});

So the question is: what options to change in Test.Json serializer and Httpclient to get the same result?

dbc
  • 104,963
  • 20
  • 228
  • 340
Misiu
  • 4,738
  • 21
  • 94
  • 198
  • Why would you even encode rather than allow full utf-8? In PHP it would be `JSON_UNESCAPED_UNICODE`, in C# a `JsonSerializerOptions` with `Encoder` set to `JavaScriptEncoder.Create(UnicodeRanges.All)`. You should get same result. – Wiktor Zychla Oct 22 '21 at 10:36
  • @WiktorZychla the PHP backend isn't under my control, it's external system and there is no option to change that :/ – Misiu Oct 22 '21 at 10:43
  • You can't change it by the looks of things. `JavaScriptEncoder` [uses](https://github.com/dotnet/corefx/blob/bd30a3f458b6b0f71204fc3630b2d29b780c4167/src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/JavaScriptEncoder.cs#L196) a function called [`Hexutil.Int32LsbToHexDigit`](https://github.com/dotnet/corefx/blob/bd30a3f458b6b0f71204fc3630b2d29b780c4167/src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/HexUtil.cs#L30) which uses uppercase, and none of this is overridable. – Charlieface Oct 22 '21 at 11:05
  • 2
    Your only option would be to override [`TryEncodeUnicodeScalar`](https://github.com/dotnet/corefx/blob/bd30a3f458b6b0f71204fc3630b2d29b780c4167/src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/JavaScriptEncoder.cs#L121) and reimplement most of the escaping code again – Charlieface Oct 22 '21 at 11:07
  • @Charlieface I'll stick with Newtonsoft if I must override that. I'll create an issue on GitHub, maybe someday this will be customizable. – Misiu Oct 22 '21 at 11:18
  • Another option would be to catch all these `\\u[A-Z0-9]{4}` groups and regex replace them to lower case – Wiktor Zychla Oct 22 '21 at 11:30
  • Interestingly enough Newtonsoft doesn't allow you to change it either – Charlieface Oct 22 '21 at 12:06
  • @Charlieface but it gives me the output I need. – Misiu Oct 22 '21 at 14:25
  • According to the [JSON Spec](https://www.json.org/json-en.html) both uppercase and lowercase letters may be used for hex digits in escaped characters. in .NET 5 your only option to tweak this escaping would seem to be to make your own [`JavaScriptEncoder`](https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.Text.Encodings.Web/src/System/Text/Encodings/Web/JavaScriptEncoder.cs) as mentioned by @Charlieface... – dbc Oct 22 '21 at 18:37
  • 1
    But in .NET 6 `Utf8JsonWriter` will have a method [`Utf8JsonWriter.WriteRawValue()`](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.utf8jsonwriter.writerawvalue?view=net-6.0) so there you will be able to create a `JsonConverter` that escapes the incoming string however you want, then writes out the raw result. – dbc Oct 22 '21 at 18:37
  • @WiktorZychla - the danger with regex escaping is that the something matching the escape sequence may be in the original unencoded string. I.e. if the original string is `\u017C` then the escaped JSON will be `"\\u017C"` so your regex has to be smart enough to do nothing when there are even numbers of backslashes... – dbc Oct 22 '21 at 18:50

0 Answers0