2

I know \u{...} can be used to specify a unicode codepoint in hexadecimal.

However, "\x86" != "\u{86}". But, "\x7F" == "\u{7F}".

What's the difference between the \x and \u escape sequences?

dgo.a
  • 2,634
  • 23
  • 35

1 Answers1

2

The "\xNN" seems to handle ASCII codes in hexadecimal format. The "\uNNNN" and "\u{NN}" handle Unicode codepoints in hexadecimal format.

ASCII only goes up to 127 in decimal format (HEX: 7F), So that would explain why the codepoint 126 (HEX: 7E, "~" character) works, but 128 (HEX: 80) and above (HEX: 85, 86, etc) does not work using the \xNN escape sequence.

You can see what's going on "underneath" by using the String#codepoints method.

puts "\x7F".inspect    # "~"
puts "\x7F".codepoints # [126]
# =====================================
puts "\x80".inspect    # "\x80" (Invalid ASCII codepoint.)
puts "\x80".codepoints # [65533]
# =====================================
puts "\u{80}".inspect    # "\u0080" (Valid Unicode codepoint.)
puts "\u{80}".codepoints # [128]

Crystal replaces the invalid codepoints with 65533, the replacement character which basically says, "This is invalid for this encoding".

dgo.a
  • 2,634
  • 23
  • 35