Background: I am using the adehabitatHR
package to create utilize distribution estimates (UDEs) of wildlife populations.
I have a list of UDEs that can be converted into shapefiles using package 'terra' vect()
. See lines 657-705 for current workflow.
If I were to do this on my local machine:
spatvec_UDEs <- lapply(UDEs, function(x){vect(x)})
I have thousands of UDEs to calculate and have used the parallel package to calculate the UDEs for some animals (e.g., winter range in one study area from 2002-2021), but have been unable to get that result saved and exported using the Acenet Siku cluster.
Notes from my work, progress, and minimal reproduceable example are included below. My question is about clarification on how the memory is mapping in my SLURM requests. I am seeking a workable example of parLapply()
, mclapply()
, or future::plan(multicore)
that can help me to understand how the memory is being used and how to get the parallel function operating for either a single or multiple core. I believe that multiple cores are needed in my case to gain the necessary memory requirements to run the adehabitatHR
functions on my data.
My understanding is as follows, I call a function to process elements in a list of adehabitatHR estUD
objects. The memory requirements for processing get as large as 2.5G and I am running about 200 tasks in my foreach loop. I am not sure I have the correct usage of task here, but I think of this a each iteration through the loop that does a single calculation is a task (e.g., 1+i would be one task if i=1, two tasks if i=2...). I believe that one node means that I will have access to ~720G, which is the SIKU storage. If each task of the estUD
object creation is ~4G, then I should have sufficient room to run 180 tasks (720G / 4 = 180).
For example, does R have to be initialized on each node in interactive mode or does this result in shared memory (e.g., 720G x 2 nodes = 1440G)? I am confused on the distributed versus shared memory and how it works in the SLURM, interactive, and bash job submissions in the Siku HPC.
I was using interactive job submissions to learn and here are my notes:
#BRB1: Results in weird infinite "Selection" repeating screen for
#mclapply
salloc --time=3:00:0 --ntasks=1 --cpus-per-task=8 --mem-per-cpu=23875M
#BRB2: Elapsed 4619 (1h16), not valid cluster for parLapply()
salloc --time=3:00:0 --ntasks=1 --cpus-per-task=16 --mem-per-cpu=11937M
#BRB3: Elapsed 4700 (slower than b4). In this case N=16, N is the num
#of (OpenMP) parallel threads #on --cpus-per-task. Add cl=cl in
#parLapply(), failed.
salloc --time=3:00:0 --nodes=1 --ntasks=1 --cpus-per-task=16 --mem-per-cpu=11937M
#BRB4: Elapsed 4262.420 Fastest. Failed parLapply()
salloc --time=3:00:0 --nodes=1 --cpus-per-task=20 --mem-per-cpu=11937M
#BRB5: Elapsed 4565.692, so extra mem-per-cpu did not make it faster exactly.
#Attempted:
BRBs_BM_WIv <- slurm_map(BRBs_BM_WI,
function (x) {vect(x)},
nodes=1,
cpus_per_node = 1)
#Result:
"sbatch: error: Batch job submission failed: Requested partition
configuration not available now."
salloc --time=3:00:0 --nodes=1 --cpus-per-task=20 --mem-per-cpu=18500M
This is a minimal reproduceable example where I attempt to save as an Rds:
#BRB6:
salloc --time=3:00:0 --nodes=2 --cpus-per-task=20 --mem-per-cpu=18500M
#For BRB6 using min code example. Example dataset is available
#in the adehabitatHR package and presented here:
library('adehabitatHR')
library('foreach')
data(puechcirc)
pc1 <- puechcirc[1]
pc2 <- puechcirc[2]
pc3 <- puechcirc[3]
Traj_li <- vector("list", 3)
Traj_li[[1]] <- pc1
Traj_li[[2]] <- pc2
Traj_li[[3]] <- pc3
DLik <- c(2.1, 2.2, 4)
system.time({
BRBs_PC <- foreach(i = 1:length(Traj_li),
.combine = c,
.packages = c("adehabitatHR","adehabitatLT", "terra")) %dopar% {
BRB(Traj_li[[i]][1],
D = DLik[i],
Tmax = 1500*60,
Lmin = 2,
hmin = 20,
type = "UD",
grid = 4000)
}
})
thenames <- c("pn1", "pn2", "pn3")
WRds <- function (x) {saveRDS(x,
paste0(here("BRB_UDs"), "/",
thenames,".Rds"))
}
BRBs_BM_WIv <- slurm_map(BRBs_BM_WI, f=WRds, nodes=2, cpus_per_node = 1)
The minimal reproduceable BRB6 result (2 nodes):
sbatch: error: Batch job submission failed: Requested partition configuration not available now
Error in strsplit(sys_out, " ")[[1]] : subscript out of bounds
In addition: Warning message:
In system("sbatch submit.sh", intern = TRUE) :
running command 'sbatch submit.sh' had status 1
These are some of the types of things I have tried as I read and learn about parallel processing on a cluster:
mclapply(X = BRBs_BM_WIv, FUN=WV, mc.cores = n.cores)
my.cluster <- parallel::makeCluster(
n.cores,
type = "FORK"
)
doParallel::registerDoParallel(cl = my.cluster)
WVec <- function (x) {vect(x)}
BRBs_BM_WIv <- parLapply(cl=my.cluster,
BRBs_BM_WI,
FUN = MVec)
stopCluster(my.cluster)
my.cluster <- parallel::makeCluster(
n.cores,
type = "FORK"
)
doParallel::registerDoParallel(cl = my.cluster)
WV <- function (x) {writeVector(x,
paste0(here("BRB_UDs"), "/",
thenames,".shp"),
overwrite=TRUE)}
parLapply(cl=my.cluster, X = BRBs_BM_WIv, FUN = MV)
stopCluster(my.cluster)
I have been reading extensively on this topic (for example), but have been unable to find specific examples that can help me with the issues I am facing.