0

I need to submit many jobs for the cluster by slurm. Each job takes different input files from different folders. My problem is the output is incomplete, and outputs after the first 8 combinations keep overwriting the previous ones. I suspected the job array is not correctly created from the combination of the two variables provided. Here is my code sample:

#!/bin/bash

#SBATCH --array=1-57%12         
#SBATCH --time=0            
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=6G
#SBATCH --output=/storage/proj/AltSp/logs/Lastz_Intron.log

DIR_OUT="/storage/proj/AltSp/data/annotation/Lastz/Br"
mkdir -p ${DIR_OUT}

QUERY="/storage/proj/AltSp/data/annotation/Introns.txt"
Species=/storage/proj/AltSp/data/Species.list   #3 lines: Br\nBn\nBo\n

# Chroms=/storage/proj/AltSp/genomes/Br/chromosomes.list # 20 lines: A1 ~ A20, one at a line
# Chroms=/storage/proj/AltSp/genomes/Bn/chromosomes.list # 18 lines: B1 ~ B18, one at a line
# Chroms=/storage/proj/AltSp/genomes/Bo/chromosomes.list # 19 lines: C1 ~ C19, one at a line

# REF is changing according to spc and chr

for spc in $(cat ${Species}); do
    chr=$(head -n ${SLURM_ARRAY_TASK_ID} genomes/${spc}/chromosomes.list | tail -1)
    REF="/storage/proj/AltSp/genomes/${spc}/${chr}.fasta"
    
    lastz ${REF} ${QUERY} K=3000 H=2200 --format=axt+ > ${DIR_OUT}/introns_vs_${spc}-${chr}.axt
done

Outputs files are:

    introns_vs_Br-A01.axt
    introns_vs_Br-A02.axt
    ...
    introns_vs_Br-A08.axt
    

spc is in a single file, one name/string in a line; chr is in several files, also one name/string in a line in each file; REF is changing according to the spc and chr of different combinations to give 57 files in total. The 57 jobs [array] are submitted with sbatch to run 12 jobs at a time in my allocation.

What was wrong with the SLURM_ARRAY_TASK_ID job array created by looping through the two variables spc and chr in my sample code that over-writes the output? Thanks!

Yifangt
  • 151
  • 1
  • 10

1 Answers1

1

I think, the issue might be associated with how you obtain $chr. To verify, add ${SLURM_ARRAY_TASK_ID} to your job output file. For example, like this:

lastz ${REF} ${QUERY} K=3000 H=2200 --format=axt+ > ${DIR_OUT}/introns_vs_${spc}-${chr}-task${SLURM_ARRAY_TASK_ID}.axt

So, if you get 57 outputs generated, then the issue is associated with how you obtain $chr.

j23
  • 3,139
  • 1
  • 6
  • 13
  • 1
    @j23Thanks! Found out the problem is the line: **chr=$(head -n ${SLURM_ARRAY_TASK_ID} genomes/${spc}/chromosomes.list | tail -1)** When **spc** takes new folder, **SLURM_ARRAY_TASK_ID** does not reset to start from the beginning of the chromosomes.list, causing the **chr** to use the old value. – Yifangt Mar 03 '22 at 14:55
  • @Yifangt That's good to hear :-) If the answer helped, you can accept the answer ;-) – j23 Mar 03 '22 at 14:56