2

I am working with data from all possible European languages. R does not recognize special characters correctly, e.g. "ć" instead of "c".

> "ć"
[1] "c" 

I have come accross this various times and found workarounds (read.csv, and other functions have the option encoding), this does not solve my problem however as described above. Further I tried

> a <- "ć"
> Encoding(a)
[1] "unknown"

and setting the options for encoding to "UTF-8" without success. Is there a way to tell R with what encoding to read from the console before actually assigning the character?

Doctor G
  • 163
  • 9

1 Answers1

2

It is due to the character not being available in the locale you have set. You can change the locale to one which has the character, but this may affect other characters, and the character may be interpreted differently if you subsequently change locales, so caveat emptor.

Sys.setlocale("LC_CTYPE","Polish")
[1] "Polish_Poland.1250"
"ć"
[1] "ć"

The more robust way to deal with this character is to use its unicode representation. Obviously, you will have to pre-process your data to change it.

"\u0107"
[1] "ć"
James
  • 65,548
  • 14
  • 155
  • 193