While I was fetching data from a greek site (data contains also greek characters), I was expecting the first value to be "ΚΡΗΤΗ", but instead I received the following string "ΚΡΗΤΗ".
As I was trying to figure out the problem I tested the following:
charToRaw("ΚΡΗΤΗ")
[1] ce 9a ce a1 ce 97 ce a4 ce 97
charToRaw("ΚΡΗΤΗ")
[1] ce 9a ce a1 ce 97 ce a4 ce 97
which at first seemed like the same thing. The problem is that each of the greek character contains two bytes of information as shown below:
charToRaw("Κ")
[1] ce 9a
But when I tried the opposite thing, namely to convert the two raw bytes back to character using rawToChar, I faced the following problem, where each of the two bytes is converted to a character.
rawToChar(as.raw(c(0xce, 0x9a)))
[1] "Κ"
Therefore I tried to find if you can force the rawToChar to use 2 bytes as one value, but I couldn't find a way. This made me to write a custom function in order to achieve my goal, but unfortunately I faced a new problem. Using the first value of the data that was acquired from the site as input to the charToRaw, the aforementioned function gives different output from which I get if I copy the content of the value and use it as input to charToRaw. You can see that in the following snippet:
> data$area[1]
[1] "ΚΡΗΤΗ"
> copiedValue = "ΚΡΗΤΗ"
> copiedValue
[1] "ΚΡΗΤΗ"
> identical(data$area[1], copiedValue)
[1] TRUE
> charToRaw(data$area[1])
[1] c3 8e c5 a1 c3 8e c2 a1 c3 8e e2 80 94 c3 8e c2 a4 c3 8e e2 80 94
> charToRaw(copiedValue)
[1] ce 9a ce a1 ce 97 ce a4 ce 97
Finally I tried the "iconv" function and a number of different encodings but neither this didn't seem to solve the problem.
iconv("ΚΡΗΤΗ", from = "Windows-1252", to = "UTF8")
[1] "ΚΡΗΤΗ"
> iconv("ΚΡΗΤΗ", from = "UTF8", to = "Windows-1252")
[1] NA