I know \u{...}
can be used to specify a unicode codepoint in hexadecimal.
However, "\x86"
!= "\u{86}"
. But, "\x7F"
== "\u{7F}"
.
What's the difference between the \x
and \u
escape sequences?
I know \u{...}
can be used to specify a unicode codepoint in hexadecimal.
However, "\x86"
!= "\u{86}"
. But, "\x7F"
== "\u{7F}"
.
What's the difference between the \x
and \u
escape sequences?
The "\xNN"
seems to handle ASCII codes in hexadecimal format. The "\uNNNN"
and "\u{NN}"
handle Unicode codepoints in hexadecimal format.
ASCII only goes up to 127 in decimal format (HEX: 7F), So that would explain why the codepoint 126 (HEX: 7E, "~" character) works, but 128 (HEX: 80) and above (HEX: 85, 86, etc) does not work using the \xNN
escape sequence.
You can see what's going on "underneath" by using the String#codepoints method.
puts "\x7F".inspect # "~"
puts "\x7F".codepoints # [126]
# =====================================
puts "\x80".inspect # "\x80" (Invalid ASCII codepoint.)
puts "\x80".codepoints # [65533]
# =====================================
puts "\u{80}".inspect # "\u0080" (Valid Unicode codepoint.)
puts "\u{80}".codepoints # [128]
Crystal replaces the invalid codepoints with 65533, the replacement character which basically says, "This is invalid for this encoding".