I frequently use the packages future.apply
and future
to parallelize tasks in R. This works perfectly well in my local machines. However, if I try to use them in a computer cluster, managed by PBS/TORQUE, the job gets killed for violating the resources policy. After reviewing the processes, I noticed that the resources_used.mem
and resources_used.vmem
as reported by qstat are ridiculously high. Is there any way to fix this?
Note: I already know and use the package batchtools
and future.batchtools
, but they produce jobs to launch to the queues, so this requires me to organize the scripts in a particular way, so I would like to avoid this for this specific example.
I have prepared the following MVE. As you can see, the code simply allocates a vector with 10^9 elements, and then performs, in parallel using future_lapply
, some operations (here just a trivial check).
library(future.apply)
plan(multicore, workers = 12)
sample <- rnorm(n = 10^9, mean = 10, sd = 10)
print(object.size(sample)/(1024*1024)) # fills ~ 8 gb of RAM
options(future.globals.maxSize=+Inf)
options(future.gc = TRUE)
future_lapply(future.seed = TRUE,
X = 1:12, function(idx){
# just do some stuff
for(i in sample){
if (i > 0) dummy <- 1
}
return(dummy)
})
If run on my local computer (no PBS-TORQUE involved), this works well (meaning no problem with the RAM) assuming 32Gb of RAM are available. However, if run through TORQUE/PBS on a machine that has enough resources, like this:
qsub -I -l mem=60Gb -l nodes=1:ppn=12 -l walltime=72:00:00
the job gets automatically killed due to violating the resources policy. I am pretty sure that this has to do with PBS/TORQUE not measuring correctly the resources used since, since if I check
qstat -f JOBNAME | grep used
I get:
resources_used.cput = 00:05:29
resources_used.mem = 102597484kb
resources_used.vmem = 213467760kb
resources_used.walltime = 00:02:06
Telling me that the process is using ~102Gb of mem and ~213Gb of vmem. It does not, you can actually monitor the node with e.g. htop
and it is using the correct amount of RAM, but TORQUE/PBS is measuring much more.