r Are there any changes I can make to speed up my parallelized code that combines rasters, does some raster algebra and writes out the results

Question

The code below is a parallel version of a some code I had written earlier. Both versions do what they are supposed to. Both versions take about 5.7 minutes to complete. Constructing a reproducible example would require me to make several tif files available so I haven't done that yet. I'm hoping some expert in using doParallel can point out some code changes that will make doParallel work faster.

library(doParallel) #Foreach Parallel Adaptor 
library(foreach) #Provides foreach looping construct
rcpChoices <- c("rcp45", "rcp85") 
modelChoices <- c("HadGEM2-ES", "GFDL-ESM2M",  "MIROC5", "IPSL-CM5A-LR")
comboChoices <- c("midCentury", "endCentury")
bpListRaw <- data.table::as.data.table(read.csv("data-raw/animals/breakpointslistRaw.csv", stringsAsFactors = FALSE))
bpListRaw[, (c("max", "model", "period", "rcp")) := as.character(max, model, period, rcp)]
thiList <- c("thi.cattle", "thi.sheep", "thi.goat", "thi.yak", "thi.broiler", "thi.layer", "thi.chicken", "thi.swine")

bpList.modMean <- bpListRaw[0,]
bpTemp <- bpListRaw

#Use foreach loop and %dopar% command
#Register CoreCluster
#Define how many cores you want to use

# generate mean and the standard deviation values across the earth system models (four for the ISIMIP data)
comboChoicesP <- comboChoices[!comboChoices %in% "observed"]
modelChoicesP <- modelChoices[!modelChoices %in% "modMean"]
UseCores <- detectCores() - 1
cl <- makeCluster(UseCores)
registerDoParallel(cl)
clusterEvalQ(cl,{
  library(raster)
  library(data.table)})
clusterExport(cl, varlist = c("modelChoicesP", "month.abb", "thiList", "bpTemp"))

start_time <- Sys.time()
x <- foreach(l = comboChoicesP) %:%
  foreach(j = rcpChoices)  %:%
  foreach(k = thiList) %dopar% {
    modelChoices_short <- unlist(lapply(strsplit(modelChoicesP, "-"), `[[`, 1))

    speciesName <- gsub("thi.", "", k)
    for (i in 1:length(modelChoicesP)) {
      fileName <- paste0("results/", k, "_", modelChoicesP[i], "_", l, "_", j, ".tif")
      rasterName <- paste0("thinameMod", i)
          rtemp <- raster::stack(fileName)
          names(rtemp) <- month.abb
          assign(rasterName, rtemp)
        }
         ras.test <- raster::stack(thinameMod1, thinameMod2, thinameMod3, thinameMod4)

        raster::values(ras.test)[raster::values(ras.test) < 0] = 0
        indices <- format(as.Date(names(ras.test), format = "%b.%d"), format = "%m")
        indices <- as.numeric(indices)
        ras.test.mean <- raster::stackApply(ras.test, indices, fun = mean, na.rm = TRUE)
        ras.test.sd <- raster::stackApply(ras.test, indices, fun = sd, na.rm = TRUE)
        names(ras.test.mean) <- month.abb
     names(ras.test.sd) <- month.abb
     fileNameMean <- paste0("results/modMean", "_", k, "_", l, "_",j, ".tif")
     fileNameSD <- paste0("results/modSD", "_", k, "_", l, "_",j, ".tif")
     writeRaster(ras.test.mean, filename = fileNameMean, format = "GTiff", overwrite = TRUE)
        writeRaster(ras.test.sd, filename = fileNameSD, format = "GTiff", overwrite = TRUE)
        temp <- bpTemp[species %in% speciesName, ]
        maxVal <- ceiling(max(maxValue(ras.test.mean)))
        temp[, (c("max", "model", "period", "rcp")) := list(maxVal, "modMean", l, j)]
        #     bpList.modMean <- rbind(bpList.modMean, temp)
      }
    end_time <- Sys.time()
    end_time - start_time

    #end cluster
    stopCluster(cl)

I think you should post to https://codereview.stackexchange.com/ instead. — F. Privé, Dec 26 '19 at 08:31
This is a great tip! I didn't know the codereview site existed! — JerryN, Dec 26 '19 at 15:24
1. Have you attempted to profile a single iteration of your loop _(no parallelization)_? I have found [this guide to profiling within RStudio](https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-with-RStudio) to be immensely useful. 2. In my personal experience, optimizing run-time without trying to parallelize has typically given more speed-up than I could have ever achieved through parallelization, even on a 16 or 32 core machine 3. For someone to be able to help you, some reproducible data will likely be necessary as we cannot read your local `.csv` file. — Matt Summersgill, Dec 26 '19 at 20:35
Haven't tried profiling this code yet. In the meantime I figured out how to print from within a worker (add `outfile="` to `cl <- makeCluster(UseCores, outfile = "")` and put in some timing with `print(paste("systime: ", Sys.time(), " pid : ", Sys.getpid())) print(paste("l :", l, " j :", j, " k :", k, " pid : ", Sys.getpid())) #, file=stdout()`. Interesting results, including a 2 *minute* window between one calc and the next for a few calcs with most being on the order of 2-4 *seconds.* — JerryN, Dec 27 '19 at 17:59

score 0 · Answer 1 · answered Dec 28 '19 at 15:51

This answer summarizes what I have learned.

UseCores <- detectCores() - 1 - assigns a number to UseCores that are potentially available for parallel operation. In my case, UseCores is 11.
To get output from print statements that are run at the start of each iteration, set up the cluster using a makeCluster command that includes outfile = ""). This puts all the information into the console. Makes diagnosing problems(!) much easier.
I put the following print statement at the start of the code just after the for (or foreach) statements - print(paste("l: ", l, " j: ", j, " k: ", k, " pid: ", Sys.getpid(), "systime: ", Sys.time())). l, j, and k are the elements my for loop operates over, the pid is the number of the independent process that the cluster has. systime tells me when the process starts.
When I run the non parallelized code, the pid is always the same and the systime value for each iteration. The difference in systime between two iterations is about 1 1/2 minutes.
When I run the parallelized code, there are 11 outputs of the print statement with the same systime value; one for each of the cores the cluster is using. The output is from the same 11 cores, starting about 1 1/2 minutes after the set started. In other words, the cores finished their first iteration and started on another. The order of the pids changed somewhat as there was some slight differences in the math time needed.

But the bottom line is that I'm now able to speed up the calculations by ballpark 10 times.

In my original question I said that code ran at about the same speed both with and without the parallelization. That is clearly wrong. Not sure what I did to think that was the case.

`raster` has some build-in parallelization support. See `raster::clusterR` — Robert Hijmans, May 21 '20 at 23:16

r Are there any changes I can make to speed up my parallelized code that combines rasters, does some raster algebra and writes out the results

1 Answers1