0

What's the standard way of serializing a utf-8 string in JSON? Should it be with u escaped sequence or should it be the hex code.

I want to serialize some sensor readings with units in a JSON Format.

For example I have temperature readings with units °C. Should it be serialized as

{
 "units": "\u00b0"
}
´´´
or should it be something like 
´´´
{
 "units":"c2b0"
}

Or could both of these supported by the standard.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982

2 Answers2

6

If JSON is used to exchange data, it must use UTF-8 encoding (see RFC8259). UTF-16 and UTF-32 encodings are no longer allowed. So it is not necessary to escape the degree character. And I strongly recommend against escaping unnecessarily.

Correct and recommended

{
  "units": "°C"
}

Of course, you must apply a proper UTF-8 encoding.

If JSON is used in a closed ecosystem, you can use other text encodings (though I would recommend against it unless you have a very good reason). If you need to escape the degree character in your non-UTF-8 encoding, the correct escaping sequence is \u00b0.

Possible but not recommended

{
  "units": "\u00b0C"
}

Your second approach is incorrect under all circumstances.

Incorrect

{
  "units":"c2b0"
}

It is also incorrect to use something like "\xc2\xb0". This is the escaping used in C/C++ source code. It also used by debugger to display strings. In JSON, it always invalid.

Incorrect as well

{
    "units":"\xc2\xb0"
}
Community
  • 1
  • 1
Codo
  • 75,595
  • 17
  • 168
  • 206
  • Is it also possible to use something like ´ {"units":"\xc2\xb0"} ´? – D_wanna_B_coder Apr 05 '19 at 14:26
  • The JSONCPP library I am using converts the string containing the °C automatically to \u00b0 C – D_wanna_B_coder Apr 05 '19 at 14:28
  • 2
    What a shame. The intention is probably to create ASCII text so the string encoding is less relevant. C and C++ are retarded when it comes to string encoding. And `{"units":"\xc2\xb0"}` is always incorrect. That's C/C++ syntax. Your debugger might display a string in UTF-8 encoding like this. – Codo Apr 05 '19 at 14:39
  • [RFC 8259](https://tools.ietf.org/html/rfc8259) (December 2017) drops UTF-16 and UTF-32 as standard (for inter-system exchange). Writers at least should start following this. – Tom Blodget Apr 07 '19 at 15:12
  • Thanks for this additional info. That's a wise move. – Codo Apr 07 '19 at 15:44
1

JSON uses unicode to be encoded, but it is specified that you can use \uxxxx escape codes to represent characters that don't map into your computer native environment, so it's perfectly valid to include such escape sequences and use only plain ascii encoding to transfer JSON serialized data.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31