Is there a way in R to merge two raw values (bytes) in one single value to be used as input by the rawToChar()?

Question

While I was fetching data from a greek site (data contains also greek characters), I was expecting the first value to be "ΚΡΗΤΗ", but instead I received the following string "ÎšÎ¡Î—Î¤Î—".

As I was trying to figure out the problem I tested the following:

charToRaw("ÎšÎ¡Î—Î¤Î—")
[1] ce 9a ce a1 ce 97 ce a4 ce 97

charToRaw("ΚΡΗΤΗ")
[1] ce 9a ce a1 ce 97 ce a4 ce 97

which at first seemed like the same thing. The problem is that each of the greek character contains two bytes of information as shown below:

charToRaw("Κ")
[1] ce 9a

But when I tried the opposite thing, namely to convert the two raw bytes back to character using rawToChar, I faced the following problem, where each of the two bytes is converted to a character.

rawToChar(as.raw(c(0xce, 0x9a)))
[1] "Îš"

Therefore I tried to find if you can force the rawToChar to use 2 bytes as one value, but I couldn't find a way. This made me to write a custom function in order to achieve my goal, but unfortunately I faced a new problem. Using the first value of the data that was acquired from the site as input to the charToRaw, the aforementioned function gives different output from which I get if I copy the content of the value and use it as input to charToRaw. You can see that in the following snippet:

> data$area[1]
[1] "ÎšÎ¡Î—Î¤Î—"
> copiedValue = "ÎšÎ¡Î—Î¤Î—"
> copiedValue
[1] "ÎšÎ¡Î—Î¤Î—"
> identical(data$area[1], copiedValue)
[1] TRUE
> charToRaw(data$area[1])
[1] c3 8e c5 a1 c3 8e c2 a1 c3 8e e2 80 94 c3 8e c2 a4 c3 8e e2 80 94
> charToRaw(copiedValue)
[1] ce 9a ce a1 ce 97 ce a4 ce 97

Finally I tried the "iconv" function and a number of different encodings but neither this didn't seem to solve the problem.

iconv("ΚΡΗΤΗ", from = "Windows-1252", to = "UTF8")
[1] "ÎšÎ¡Î—Î¤Î—"
> iconv("ÎšÎ¡Î—Î¤Î—", from = "UTF8", to = "Windows-1252")
[1] NA

How did you fetch the data originally? Chances are if you specified the correct encoding then you wouldn't have to clean it up after the fact. What OS are you using? — MrFlick, Dec 24 '20 at 18:16
For fetching I used the httr::GET and for converting the data jsonlite::fromJSON `library(httr)` `library(jsonlite)` `res = GET(selectData["powerConsumption"], add_headers(Authorization = my_token))` `data = fromJSON(rawToChar(res$content))` I am using Windows 10. **I tried now rjson::fromJSON and it seems that the problem with the encoding has been solved**, now I can read the greek characters, but my question remains — Sideridis, Dec 24 '20 at 21:18

Is there a way in R to merge two raw values (bytes) in one single value to be used as input by the rawToChar()?

0 Answers0