3

On this RFC: https://www.rfc-editor.org/rfc/rfc7616#page-19 at page 19, there's this example of a text encoded in UTF-8:

  J  U+00E4 s  U+00F8 n      D  o  e
  4A C3A4   73 C3B8   6E 20 44  6F 65

How do I represent it in a Rust String?

I tried https://mothereff.in/utf-8 and doing J\00E4s\00F8nDoe but it didn't work.

Community
  • 1
  • 1
Gatonito
  • 1,662
  • 5
  • 26
  • 55

2 Answers2

9

"Jäsøn Doe" should work fine. Rust source files are always UTF-8 encoded and a string literal may contain any Unicode scalar value (that is, any code point except surrogates, which must not be encoded in UTF-8).

If your editor does not support UTF-8 encoding, but supports ASCII, you can use Unicode code point escapes, which are documented in the Rust reference:

A 24-bit code point escape starts with U+0075 (u) and is followed by up to six hex digits surrounded by braces U+007B ({) and U+007D (}). It denotes the Unicode code point equal to the provided hex value.

suggesting the correct syntax should be "J\u{E4}s\u{F8}n Doe".

trent
  • 25,033
  • 7
  • 51
  • 90
3

You can refer to Rust By Example as everything is not covered in rust eBook

(https://doc.rust-lang.org/stable/rust-by-example/std/str.html#literals-and-escapes)

You can use the syntax \u{your_unicode}

let unicode_str = String::from("J\u{00E4}s\u{00F8}nDoe");
println!("{}", unicode_str);
Sudhir Dhumal
  • 902
  • 11
  • 22