6

I can't seem to get to the bottom of this, I want to read a csv file that contains Arabic characters but it's not reading it properly.

this is my sessionInfo

R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United    States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3 plyr_1.8.3 

loaded via a namespace (and not attached):
[1] magrittr_1.5   R6_2.1.2       assertthat_0.1 parallel_3.2.4 DBI_0.3.1           tools_3.2.4   
[7] Rcpp_0.12.4 

I tried this

ar <- read.csv (file.choose(),  encoding = "UTF-8") 

And this

ar <- read.csv (file.choose(),  encoding = "Windows-1256")

It didn't work out for me, I also tried setting the locale to Arabic but no luck

Sys.setlocale("LC_ALL","Arabic")

Any suggestions?

Artem
  • 3,304
  • 3
  • 18
  • 41
Rayan Sp
  • 1,002
  • 7
  • 17
  • 29
  • Try the `fileEncoding` argument rather than `encoding`. – Thomas Apr 06 '16 at 08:58
  • @Thomas I tried UTF-8 fileEncoding, got the same result. – Rayan Sp Apr 06 '16 at 09:03
  • @Thomas but when I tried Arabic-1256 I got this error: `Warning messages: 1: In read.table(file = file, header = header, sep = sep, quote = quote, : invalid input found on input connection 'D:\CALL_END_DATA_FEB16_040416.txt' 2: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'location.txt'` – Rayan Sp Apr 06 '16 at 09:03
  • i think the last line in your file is not a empty newline – chinsoon12 Apr 06 '16 at 09:11
  • @chinsoon12 I don't think that's the case because it can read the file in UTF-8 encoding – Rayan Sp Apr 06 '16 at 09:21
  • @RayanSp Those are warnings, not errors, so the code should have succeeded (though may not look as you intend). – Thomas Apr 06 '16 at 09:26
  • pls try to do replicate same in python 3 and post the details, R and earlier python versions as issues – Pradi KL May 18 '17 at 02:38

1 Answers1

0

You can read the file using readLines with the argument warn = FALSE, then execute read.csv with text argument set to the result of readLines as below.

arabic.csv content:

LabelName,Label1,Label2,SpeciesLabel,Group,Subgroup,Species
التسمية 1,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 2,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 3,Group 1,Subgroup 1,Species 1,1,1,1

R code to read csv file:

arabic <- readLines("arabic.csv", warn = FALSE, encoding = "UTF-8")
Data <- read.csv(text = arabic)
str(Data)
Output:

'data.frame':   3 obs. of  7 variables:
 $ X.U.FEFF.LabelName: Factor w/ 3 levels "التسمية 1","التسمية 2",..: 1 2 3
 $ Label1            : Factor w/ 1 level "Group 1": 1 1 1
 $ Label2            : Factor w/ 1 level "Subgroup 1": 1 1 1
 $ SpeciesLabel      : Factor w/ 1 level "Species 1": 1 1 1
 $ Group             : int  1 1 1
 $ Subgroup          : int  1 1 1
 $ Species           : int  1 1 1
Artem
  • 3,304
  • 3
  • 18
  • 41