4

I am having trouble displaying Russian characters in the Rstudio console. I load an Excel file with Russian using the readxl package. The cyrillic displays properly in the dataframe. However, if I run a function that has an output that includes the variable names, the RStudio consoles displays symbols instead of the proper Cyrillic characters.

test.xlsx contains two columns - зависимая переменная (dependent variable - numeric) and независимая переменная (independent variable, factor).

зависимая_переменная    независимая_переменная
5   а
6   б
8   в
8   а
7.5 б
6   в
5   а
4   б
3   в
2   а
5   б

My code:

Sys.setlocale(locale = "Russian")
install.packages("readxl")
require(readxl)
basetable <- readxl::read_excel('test.xlsx',sheet = 1)
View(basetable)
basetable$независимая_переменная <- as.factor(basetable$независимая_переменная)

str(basetable)

This is what I get for the output of the str function:

 Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  11 obs. of  2 variables:
 $ çàâèñèìàÿ_ïåðåìåííàÿ  : num  5 6 8 8 7.5 6 5 4 3 2 ...
 $ íåçàâèñèìàÿ_ïåðåìåííàÿ: Factor w/ 3 levels "а","б","в": 1 2 3 1 2 3 1 2 3 1 ...

I want to have the variable names displayed properly in Russian because I will be building many models from this data. For reference, here is my sessionInfo()

R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251   
[3] LC_MONETARY=Russian_Russia.1251 LC_NUMERIC=C                   
[5] LC_TIME=Russian_Russia.1251    
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] readxl_0.1.1 shiny_0.13.1 dplyr_0.4.3 
loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2      digest_0.6.9     assertthat_0.1   mime_0.4        
 [5] chron_2.3-47     R6_2.1.2         xtable_1.8-2     jsonlite_0.9.19 
 [9] DBI_0.3.1        magrittr_1.5     lazyeval_0.1.10  data.table_1.9.6
 [13] tools_3.2.3      httpuv_1.3.3     parallel_3.2.3     htmltools_0.3        
matsuo_basho
  • 2,833
  • 8
  • 26
  • 47
  • I'm not sure if `readxl` supports encoding argument. But I would suggest you to try and save your file in, say, *.csv format. And then `read.table()` or `fread()` with `encoding = "UTF-8"`. Also, quick research showed that `xlsx` package should have Excel file read function with encoding argument. – statespace Apr 01 '16 at 13:24
  • Batanichek - thanks, I actually ran it with the proper variable, corrected it just now. – matsuo_basho Apr 01 '16 at 13:28
  • @A.Val. I just ran it with the xlsx package and specified the encoding . I still get the same output in gibberish Cyrillic. `basetable <- read.xlsx('test.xlsx',sheetIndex=1,encoding="UTF-8")` – matsuo_basho Apr 01 '16 at 13:33
  • @A.Val. Problem is that when I save the xlsx file into CSV format, I get question marks wherever the cyrillic characters are. That's why I turned to the readxl package, which at least displays the data properly when viewing the variables anywhere except the console. – matsuo_basho Apr 01 '16 at 13:38
  • Another option is to save it as Unicode text (which is tab separated file) and it preserves cyrillic text. I'm trying this myself, can't really make it to work. Will let you know if I ever come up with anything. – statespace Apr 01 '16 at 13:43
  • Do you have problem if you create data.frame manualy for example `df1=data.frame(ппп=1,проен=2) str(df1)` ? – Batanichek Apr 01 '16 at 13:56
  • Do you have problem in Rstudio only? Or in not command-line R also? – Batanichek Apr 01 '16 at 13:58
  • @ Batanichek Yes, I have the same problem if I create the data frame manually in Rstudio. With respect to command-line R, I'm not able to create the dataframe. `> df1=data.frame(ппп=1,проен=2) Error: unexpected input in "df1=data.frame(\"` Looks like it has a problem with me entering Russian characters. So perhaps it's a setting that I have to change in R (I'm using the RGui)? – matsuo_basho Apr 01 '16 at 14:17
  • If I just create a data frame right in the R (not Rstudio) console, my Russian а comes out like this `> df1 <- data.frame(ô = 1:5)` – matsuo_basho Apr 01 '16 at 14:33
  • 1
    Unfortunately, RStudio currently has some trouble displaying multibyte characters on Windows. :-/ This is a bug in RStudio and unfortunately I don't think there is a resolution at the moment. – Kevin Ushey Apr 02 '16 at 00:09

1 Answers1

2

Try to change dataframe colnames encoding to UTF-8.

Encoding(colnames(YOURDATAFRAME)) <- "UTF-8"
Sergey P.
  • 194
  • 1
  • 12