I have a folder that has 5000 csv files, each file belonging to one location and containing daily rainfall from 1980 till 2015. Sample structure of a file is as follows:
sample.file <- data.frame(location.id = rep(1001, times = 365 * 36),
year = rep(1980:2015, each = 365),
day = rep(1:365, times = 36),
rainfall = sample(1:100, replace = T, 365 * 36))
I want to read one file and calculate for each year, total rainfall and write the output again. There are multiple ways I can do this:
Method 1
for(i in seq_along(names.vec)){
name <- namees.vec[i]
dat <- fread(paste0(name,".csv"))
dat <- dat %>% dplyr::group_by(year) %>% dplyr::summarise(tot.rainfall = sum(rainfall))
fwrite(dat, paste0(name,".summary.csv"), row.names = F)
}
Method 2:
my.files <- list.files(pattern = "*.csv")
dat <- lapply(my.files, fread)
dat <- rbindlist(dat)
dat.summary <- dat %>% dplyr::group_by(location.id, year) %>%
dplyr::summarise(tot.rainfall = sum(rainfall))
Method 3:
I want to achieve this using foreach
. How can I parallelise the above task
using do parallel
and for each
function?