I'd like to read SELECTED multiple files with sparklyr. I have multiple csv files (eg. a1.csv, a2.csv, a3.csv, a4.csv, a5.csv) in a folder, and I'd like to read a2.csv, a3.csv, a4.csv files at once if possible.
I know I can read csv file with spark_read_csv(sc, "cash", "/dir1/folder1/a2")
so I tried
a_all <- data.frame(col1=integer(),col2=integer())
a_all <- sdf_copy_to(sc, a_all, "a_all")
for(i in 2:4){
tmp1 <- spark_read_csv(sc=sc, name="tmp1", paste0("/dir1/folder1/a",i))
a_all <- sdf_bind_rows(a_all, tmp1)
}
As a result I will get a spark_tbl which is binding a2.csv, a3.csv, a4.csv files rbind(a2,a3,a4)
.
I think there is a way to do it easier (maybe without for loop) by using path=
but I am not sure how to select only few csv files in a folder. Please help!