1

I have some Russian texts that I would like to work with. Running a quick test:

> x <- "привет"
> x
[1] "\320\277\321\200\320\270\320\262\320\265\321\202"
> text <-scan("./Texts/Chekhov.txt", what = "character",
             encoding = "UTF-8")
> text[1]
[1] "<U+0413><U+041E><U+0420><U+0415>"

The file Chekhov.txt being utf8 encoded text file containing a Russian text. So far so good - this represents the first word "ГОРЕ". But how do I get R to give me the Cyrillic letters instead of the unicode representation?

Doing some research, the advice tends to be to change the locale:

> Sys.setlocale(category = "LC_COLLATE", locale = "Russian")

When I try this, I get the following error message:

> OS reports request to set locale to "Russian" cannot be honored

It also seems strange to me that I get the following:

> Sys.getlocale()
[1] "C"

I'm on Mac OS X using RStudio

pwwolff
  • 606
  • 1
  • 5
  • 20

0 Answers0