I have a 16 GB ram running w10 64 Bit on a 64 bit version of R . Im trying to merge a bunch of CSVs on this link (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml) specifically the yellow bit Edit: only for one year atm, but would want to import more data once this works
heres the code im running
library(readr)
FList <- list.files(pattern = "*.csv")
for (i in 1:length(FList))
{
print(i)
assign(FList[i], read_csv(FList[i]))
if (i==2) {
DF<-rbind(get(FList[1]),get(FList[2]))
rm(list = c(FList[1],FList[2]))
}
if (i>2)
{
DF<-rbind(DF,get(FList[i]))
rm(list = FList[i])
}
gc()
}
I get the error on the 6th iteration, task manager shows the memory usage in the 90% during the rbind operation but drops to 60 after its done
Running gc() after the error gives the following
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 3821676 204.1 10314672 550.9 13394998 715.4
Vcells 1363034028 10399.2 3007585511 22946.1 2058636792 15706.2
>
I do not have a lot of experience with this, any help in optimizing the code would be appreciated. p.s would running it with read.csv help? I'm assuming the date time format in the few columns might be resource hungry. Havent tried it yet because I need the columns in datetime format.