While i was trying to read a txt file with read.table()
, I met problems viewing the dataset in Rstudio. The original txt.file consists of three columns data including ID, Content(Cantonese) and Time, like the following format:
100008251304976 你又知喎 2019-10-04 16:52:15
100027970365477 甘你買多幾包花生,小心熱氣 2019-10-04 16:23:43
I wrote the code to read it into Rstudio
x = read.table('comment.txt', encoding = 'utf-8', quote = "",fill = T,sep = '\t')
but the result is messey data.
ç”˜ä½ è²·å¤šå¹¾åŒ…èŠ±ç”Ÿï¼Œå°å¿ƒç†±æ°£ 2019å¹´10æ
Then i checked my env
and locale
as follows
sessionInfo()
#R version 3.6.1 (2019-07-05)
#Platform: x86_64-w64-mingw32/x64 (64-bit)
#Running under: Windows 10 x64 (build 18362)
#Matrix products: default
#locale:
#[1] LC_COLLATE=English_Hong Kong SAR.1252 LC_CTYPE=English_Hong Kong SAR.1252
#[3] LC_MONETARY=English_Hong Kong SAR.1252 LC_NUMERIC=C
#[5] LC_TIME=English_Hong Kong SAR.1252
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#loaded via a namespace (and not attached):
#[1] compiler_3.6.1 rsconnect_0.8.16 tools_3.6.1 tinytex_0.16 xfun_0.10
#[6] packrat_0.5.0
Sys.getlocale()
# "LC_COLLATE=English_Hong Kong SAR.1252;LC_CTYPE=English_Hong Kong SAR.1252;LC_MONETARY=English_Hong Kong SAR.1252;LC_NUMERIC=C;LC_TIME=English_Hong Kong SAR.1252"
Sys.getenv("LANG")
# "C.UTF-8"
Any ideas why I can not load txt file properly? By the way, i am able to tpye or print
traditional Chinese in the Rstudio.
print("試試")
# [1] "試試"