0

TARGET : Check whether a list of files have same encoding before import and rbind ,if not the same STOP run

# files list & check encoding
FL_PATH <- list.files(path,pattern = "*.csv",full.name = T)
library(readr)
lapply(FL_PATH,guess_encoding)

# if there is "UTF-8" , STOP RUN , if "Shift_JIS" , RUN the next scripts below :

# import
library(rio)
DT <- rbindlist(lapply(FL_PATH ,import,sep=",",setclass = "data.table"))

# OVER 500 rows to run if the files are same encoding to rbind
DT[,"NEW_COL":="A"]
DT[,"NEW_COL_2":="B"]
.....

# result of --lapply(FL_PATH,guess_encoding)
> lapply(FL_PATH,guess_encoding)
[[1]]
# A tibble: 3 x 2
  encoding  confidence
  <chr>          <dbl>
1 Shift_JIS       0.8 
2 GB18030         0.76
3 Big5            0.46

[[2]]
# A tibble: 3 x 2
  encoding  confidence
  <chr>          <dbl>
1 GB18030         0.82
2 UTF-8       0.8 
3 Big5            0.44
  • Problem 1 : How to access the variables of the result of lapply readr to detect UTF-8 and STOP (have to revise the encoding outside R if UTF-8 exist ?)
  • Problem 2 : How to connect the large numbers of normal processing scripts with "if & STOP run" ?
rane
  • 901
  • 4
  • 12
  • 24
  • 1
    Instead of going through all the results, how about letting `lapply` return only the top result? Try `sapply(FL_PATH,function(x) guess_encoding(x)$encoding[1])` – Rohit Mar 26 '19 at 07:03
  • Thankyou Rohit , thats exactly the way to ACCESS tibble , and readr raise the first one as highest percentage . But lets say grepl("UTF-8",sapply(FL_PATH,function(x) guess_encoding(x)$encoding[1])) return me TRUE and FALSE , i have no idea how to connect to import / not import approach . – rane Mar 26 '19 at 08:24

1 Answers1

1

First, get the most probable encoding:

enc <- sapply(FL_PATH,function(x) guess_encoding(x)$encoding[1])

Then, if any of the files are UTF-8, stop execution.

if(any(grepl('UTF-8',enc)))
  stop('UTF-8 present') # This will stop with an error if true
# Now, read files and rbind
dlist <- lapply(FL_PATH,read_csv)
DT <- rbindlist(dlist)
Rohit
  • 1,967
  • 1
  • 12
  • 15