5

I have a big RDS file that I want to work with in parallel using R. The file takes 7.3 GB of ram when loaded.

If I try to use many cores, R crashes because it runs out of memory. Is there a way to tell mclapply to use shared memory instead of making copies of the object?

This is the code that I have:

results <- readRDS('ResultsICC.RDS')

rand <- 0
Icc <- c(.5, 1, 1.5)
n <- c(.1, .5, 1)
phi <- c(0, .5, 1)
parameterSettings <- expand.grid(rand=rand, Icc=Icc, n=n, phi=phi)

rr <- list()
Ns <- results[[1]][[1]][[2]][,c('Country', 'n')]
EstimatedBestPFiveArmRaw <- matrix(NA, 26, 1000)
EstimatedBestP <- matrix(NA, 26, 1000)


outterloop <- function(dataIN){
  for(k in 1:1000){ #1000
    best <- dataIN[[k]][[2]]
    EstimatedBestPFiveArmRaw[,k] <- rep(weighted.mean(best$estimatedBestPFiveArmRaw, best$n), 26) 
    pHat <- dataIN[[k]][[3]]  
    best <- Ns
    best$estimatedBest <- best$estimateBestP <- NA
    for(j in 1:26){ #26
      best$estimatedBest <- sapply(split(pHat[,paste0('cohort', j+1, 'pHat')], pHat$Country), 
                                   which.max)
      for(i in 1:nrow(best)) #nrow(best)
        best$estimatedBestP[i] <- pHat$p[pHat$Country==best$Country[i] &
                                           pHat$treatNum==best$estimatedBest[i]]  
      EstimatedBestP[j, k] <- weighted.mean(best$estimatedBestP, best$n)
    }
    rr <- (EstimatedBestP/EstimatedBestPFiveArmRaw-1)*100
  }
  return(rr)
}
library(parallel)
rr <- mclapply(X = results, FUN = outterloop, mc.cores = 27, mc.preschedule = T)

I'm running this in a linux box with 32 cores and 64GB of ram.

Thanks!

Ignacio
  • 7,646
  • 16
  • 60
  • 113
  • As an overly broad comment and tip that doesn't directly answer your question, checkout the [`data.table`](http://cran.r-project.org/web/packages/data.table/index.html) package. It doesn't copy data when doing operations . I don't know the specific of how it could help you, but it is probably worth checking out. – Richard Erickson Apr 27 '15 at 15:51
  • Matloff has written an R package that implements shared memory for multiprocessing. IIRC its name is "sms". – IRTFM Apr 27 '15 at 16:02
  • the multicore backend, such as with DoMC package, would share memory. you'd just need to replace your `mclapply()` with a `foreach` loop. – Dominik Apr 27 '15 at 16:25

0 Answers0