Table of contents
- The context
- The problem
- The question
The context
In the context of R, I'm aware that stringi::stri_unescape_unicode()
could be used for converting a Unicode code to its corresponding character.
For example, the Unicode code for á
(LATIN SMALL LETTER A WITH ACUTE) and 好
is U+00E1 and U+597D, respectively. This means that I can insert those character by executing the following.
library(stringi)
stringi::stri_unescape_6unicode("\\u00E1")
stringi::stri_unescape_unicode("\\u597D")
[1] "á"
[1] "好"
I'm also aware that characters in the following ranges are for private use. The following quote was retrieved fromd this glossary (archive) in https://unicode.org.
Private-Use Code Point. Code points in the ranges U+E000..U+F8FF, U+F0000..U+FFFFD, and U+100000..U+10FFFD. (See definition D49 in Section 3.5, Properties.) These code points are designated in the Unicode Standard for private use.
As you can read in the quote, there are three ranges. The following lists those characters that are the limits of those ranges.
- First range: (U+E000)
- First range: (U+F8FF)
- Second range: (U+F0000)
- Second range: (U+FFFFD)
- Third range: (U+100000)
- Third range: (U+10FFFD)
The problem
When I try to print the characters in the in the list above that belong to the first range (i.e. (U+E000) and (U+F8FF)), there's no problem.
stringi::stri_unescape_unicode("\\ue000")
stringi::stri_unescape_unicode("\\uf8ff")
[1] ""
[1] ""
However, when I try to print the characters in shown in the list above that belong to the second range (i.e. (U+F0000) and (U+FFFFD)), R doesn't return those characters.
stringi::stri_unescape_unicode("\\uf0000")
stringi::stri_unescape_unicode("\\uffffd")
[1] "0"
[1] "\uffffd"
Similarly, the following doesn't print the characters shown in the list above that belong in the third range (i.e. (U+10FFFD) and (U+100000))
stringi::stri_unescape_unicode("\\u100000")
stringi::stri_unescape_unicode("\\u10fffd")
[1] "က00"
[1] "ჿfd"
The question
Why isn't
stringi::stri_unescape_unicode()
able to display characters that belong to the ranges U+F0000..U+FFFFD or U+100000..U+10FFFD?Is there any function in R that is able to return those characters?