I'm Chinese, and deal with Chinese characters frequently. I still can't find a way to properly encode Chinese.
The best way so far is read.csv(file_name, fileEncoding = "UTF8")
. But when I list all the files in the wd
using dir()
, it comes up garbled:
> dir()
[1] "16PF.csv" "codebook.html" "Table_2.xls"
[4] "win-library" "鏈牎鏂囩.csv" "鏈牎鏂囩.R"
This is after I set the system locale:
> Sys.setlocale(category = "LC_ALL", locale = "chs") #cht for traditional Chinese, etc.
[1] "LC_COLLATE=Chinese_People's Republic of China.936;LC_CTYPE=Chinese_People's Republic of China.936;LC_MONETARY=Chinese_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese_People's Republic of China.936"
In Global Options
-> Code
-> Saving
, the default encoding is set to UTF8.
I also wish to make R Studio read Chinese text, without having to use a command every time I open it.
Update:the method provided by @niki is good for simple text reading: https://imgur.com/a/6XtwmzC
However, I need to deal with Chinese characters in data frames.
I just discovered that read.csv(file_name, fileEncoding = "UTF8")
actually doesn't work:
> grades=read.csv("grades.csv",fileEncoding="UTF-8")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection 'grades.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'grades.csv'
Unless I set the system locale:
> Sys.setlocale("LC_ALL","Chinese")
[1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese (Simplified)_China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_China.936"
> grades=read.csv("grades.csv",fileEncoding="UTF-8")
> head(grades[,c(3,6,8)],3)
文理类型 学生类型 数学
1 文科 应届 138
2 文科 应届 134
3 文科 应届 119
Because it's cumbersome to set system locale every time I start R Studio, I tried to put Sys.setlocale("LC_ALL","Chinese")
in Rprofile
, but it didn't seem to work.
Update 2: I seem to have located the wrong RProfile
file. It has now been resolved.