0

I am doing a large simulation for a research project--simulating 1,000 football seasons and analyzing the results. As the seasons will be spread across multiple nodes, I need an easy way to save my output data into a file (or files) to access later. Since I can't control when the nodes will finish, I can't have them all trying to write to the same file at the same time, but if they all save to a different file, I would need a way to aggregate all the data easily afterward. Thoughts?

jntrcs
  • 527
  • 1
  • 5
  • 13
  • By "nodes" do you mean multiple physical machines? – Hong Ooi Dec 08 '16 at 01:38
  • Good question. The supercomputer has many machines with 24 processors apiece. I'm not sure if I'm going to do the simulation on one machine or across many. – jntrcs Dec 08 '16 at 04:21
  • @jntrcs Is there a common storage area that all the nodes can access? If so, you can determine an appropriate folder structure and save the results of each individual simulation into the corresponding folder on a single drive. The code I posted below would work in this scenario. – dataanalyst Dec 08 '16 at 05:59
  • do yo use `R` parallel function or spread the work _manually_ ? – ClementWalter Dec 08 '16 at 10:36
  • 1
    in any case you can always generate a key with, for instance, the `digest` package so as to be sure to have unique names for each task. Then you can use `save` and, once done, loop with `list.files` onto your folder – ClementWalter Dec 08 '16 at 10:38

1 Answers1

0

I do not know if this question was asked already. But here is what I do in my research. You can loop through the file names and aggregate them into one object like so

require(data.table)
dt1 <- data.table()
for (i in 1:100) {
  k <- paste0("C:/chunkruns/dat",i,"/dt.RData")
  load(k)
  dt1 <- rbind(dt1,dt)
}

agg.data <- dt1
rm(dt1)

The above code assumes that all your files are saved in different folders but with same file name.

Or else, you can use the following to identify file paths matching a pattern and then combine them

require(data.table)
# Get the list of files and then read the files using read.csv command
k <- list.files(path = "W:/chunkruns/dat", pattern = "Output*", all.files = FALSE, full.names = TRUE, recursive = TRUE)
m <- lapply(k, FUN = function (x) read.csv(x,skip=11,header = T))
agg.data <- rbindlist(m)
rm(m)
dataanalyst
  • 316
  • 3
  • 12