2

I am running a large parallel R function, which contains a for-loop to be executed on each core. The function starts running fine, but as the loop progresses, so does the computer's memory usage, until eventually it runs out of RAM and the process fails. I don't understand why this is happening, given that the loop neither creates nor grows objects, and would appreciate any guidance to fix the problem.

I am working on a Linux system (Ubuntu 19.04) with 16 hyperthreaded CPU cores and 128 GB of RAM. Using the doParallel package, I create a forked cluster, and distribute the tasks using foreach. At each iteration of the loop on each core, I have each slave process print out to the console its total memory use (using pryr::mem_used()). I also have it print out the sum of the object.size() command applied to all objects in the local environment (accessed using environment()), and the same figure for the sum of the object.size() command applied to all objects in the global environment that are visible to the process.

At the start of my call to foreach, the total amount of memory use as shown by htop (invoked separately from the terminal) is nearly double what I would expect. There are around 12 GB of objects in the global environment; each of my 31 logical cores contains about 1 GB of objects (according to the sum of object.size()); but memory use reported by htop is around 70 GB. Called from each core, mem_used() reports around 8.5 GB of memory use seen by each core--I presume that's the roughly 1 GB of objects on each core, plus the ~7GB of objects in the global environment that are visible to the cores.

I call gc() regularly throughout the loop, and as each core proceeds through its loop, the memory usage reported by object_size() and mem_used() always stays in the same range. The memory usage reported by htop, however, steadily increases as time goes on, until eventually it reaches 126 GB and crashes.

Here's the essence of the call to foreach:

library(doParallel)
num_cores <- detectCores() - 1
SimWrapper <- function(platform, cores) {
  if (platform != 'windows') {
    workers <- makeCluster(cores, type= 'FORK', outfile = '')
    registerDoParallel(workers)
    gc()
    return(foreach(x = 1:cores, .packages = c('data.table', 'broom', 'rlang', 'sn', 'zoo', 'stringr', 'pryr')) %dorng% SimTournament(x))
  }
}
out <- SimWrapper(.Platform$OS.type, cores = num_cores)

Before calling foreach, I've already chopped up my data into the appropriate number of pieces. Each one is a separate list object called CoreXXDTs, containing four data.tables. The parallelized SimTournament function fetches the data for each core from the global environment as follows:

SimTournament <- function(core) {
  ThisCoreDTs <- get(paste0('Core', core, 'DTs'), pos = .GlobalEnv)
  list2env(setNames(ThisCoreDTs, paste0('ThisCore', names(ThisCoreDTs))), envir = environment())
  rm(ThisCoreDTs)
  gc()
}

I check memory usage for each core within SimTournament as follows. Summed over 31 logical cores, this never gets anywhere close to 128 GB (unless the 7 GB visible in the global environment are getting copied onto every logical core, but that would be more like 200 GB of usage, and I would see all that usage right at the start of the foreach loop, rather than having it rise incrementally as the loop progresses).

# Initialize empty data.table of objects in local environment
ThisCoreMemUse <- data.table(Object = ls(environment()), Size = rep(NA_character_, length(ls(environment()))), SizeInMiB = rep(NA_real_,
length(ls(environment()))))

# Populate size of each object
for (i in 1:(nrow(ThisCoreMemUse))) {
  set(ThisCoreMemUse, i, 'Size', format(object.size(get(ThisCoreMemUse$Object[i])), units = 'MiB'))
}
ThisCoreMemUse[, SizeInMiB := as.numeric(gsub(' MiB', '', Size))]
setorder(ThisCoreMemUse, -SizeInMiB)

# Repeat for objects in global environment visible to slave process
GlobalEnvMemUse <- data.table(Object = ls(.GlobalEnv), Size = rep(NA_character_, length(ls(.GlobalEnv))), SizeInMiB = rep(NA_real_, length(ls(.GlobalEnv))))

for (i in 1:(nrow(GlobalEnvMemUse))) {
  set(GlobalEnvMemUse, i, 'Size', format(object.size(get(GlobalEnvMemUse$Object[i])), units = 'MiB'))
}
GlobalEnvMemUse[, SizeInMiB := as.numeric(gsub(' MiB', '', Size))]

# Calculate total reported memory usage for local and global environments, and combine into one data.table
ThisCoreMemUse <- rbindlist(list(data.table(Object = 'Total on core', Size = NA_character_, SizeInMiB = sum(ThisCoreMemUse$SizeInMiB)), data.table(Object = 'Visible in global env', Size = NA_character_, SizeInMiB = sum(GlobalEnvMemUse$SizeInMiB)), ThisCoreMemUse))

# Print results to console
print(paste0('Core number ', core, ' has total memory use of ', mem_used(), '. Its five largest objects are:'))
print(head(ThisCoreMemUse, 7))

Memory use increases until it gets to around 125 GB or so. I then get the following error, and the script aborts.

Error in unserialize(socklist[[n]]) : error reading from connection
Calls: SimWrapper ... recvOneData -> recvOneData.SOCKcluster -> unserialize
Execution halted

0 Answers0