2

I submit my job using slurm, at the beginning, everything works well. After adding a Rscript to perform a simple filtering, the system load average suddenly boost up to 1000+, this is quite abnormal. I've tring to search through Google, but find noting. My code showed as followed:

#!/bin/bash

#SBATCH --job-name=gtool
#SBATCH --partition=Compute
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH -a 1-22

for file in output/impute2/data_chr"${SLURM_ARRAY_TASK_ID}".*impute2
do
  echo "$file" start!
  # file prefix
  foo=$(echo "$file" | awk -F "/" '{print $NF}' | awk -F . '{print $1"."$2}')
  # use R for subset ID
  Rscript src/detect.impute.snp.r "$file"
  # gtool subset
  gtool -S \
    --g "$file" \
    --s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample \
    --og output/impute2_subset/"$foo".gen \
    --inclusion output/impute2_subset/"$foo".SNPID.txt
  # gtool GEN to PED 
  gtool -G \
    --g output/impute2_subset/"$foo".gen \
    --s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample \
    --ped output/impute2_subset_2_PLINK/"$foo".impute2.ped \
    --map output/impute2_subset_2_PLINK/"$foo".impute2.map \
    --chr "${SLURM_ARRAY_TASK_ID}" \
    --snp
  echo "$file" fin!
done

Rscipt:

options(tidyverse.quiet = TRUE)
options(readr.show_col_types = FALSE) 
library("tidyverse")
args <- commandArgs(T)
fn <- args[1]
d <- read_delim(fn,
  col_names = F,
  delim = " ",
  col_select = c(2, 4, 5))

fn.out <- str_sub(last(str_split(fn,"/")[[1]]), 1, -9)
d %>% mutate(len1 = nchar(X4),
             len2 = nchar(X5)) %>%
  arrange(desc(X4), desc(X5)) %>% 
  filter(len1==1, len2 == 1) %>%
  select(X2) %>%
  write_tsv(file = str_c("output/impute2_subset/", fn.out,".SNPID.txt"),
            col_names = F)

scontrol also show that my job only use one CPU:

JobId=4873 ArrayJobId=4872 ArrayTaskId=1 JobName=gtool
   ......
   NodeList=localhost
   BatchHost=localhost
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   ......

R and gtool are using single thread and didn't provide a thread parameter, --ntasks also set to 1, where may the holes are?

sirenfrappe
  • 57
  • 1
  • 5
  • Have you tried `#SBATCH -a 1-22%1`? Quote: "To throttle a job array by keeping only a certain number of tasks active at a time use the %N suffix where N is the number of active tasks" – Ryan SY Kwan Sep 28 '22 at 09:30

1 Answers1

1

Some libraries used by R and/or gtools like MKL, BLIS or OpenBLAS might be configured system-wise to use all cores of the node and not detect that Slurm only allocated one CPU. You can try to add

export OMP_NUM_THREADS=1
export BLIS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

in your submission script just before the for loop..

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
  • thanks! Bty, I also wonder that how to locate that which libary leading to this problem when i encounter the same situation using different language or tools? For example, when i use `future` packge in R terminal, the load average can reach up to 3000+. – sirenfrappe Sep 29 '22 at 10:04
  • other than reading the documentation of the package and carefully screening all their dependencies, or their source code, there is no easy way. You could insert debugging statements in your code when a package is used and try to correlate that with the load of the server. – damienfrancois Sep 29 '22 at 13:44