I have the following SLURM job script named gzip2zipslurm.sh
:
#!/bin/bash
#SBATCH --mem 70G
#SBATCH --ntasks 4
echo "Task 1"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.A-B.xml.tar.gz &
echo "Task 2"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.C-H.xml.tar.gz &
echo "Task 3"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.I-N.xml.tar.gz &
echo "Task 4"
srun -n1 java -Xmx10g -jar tar2zip-1.0.0-jar-with-dependencies.jar articles.O-Z.xml.tar.gz &
echo "Waiting for job steps to end"
wait
echo "Script complete"
I submit it to SLURM by sbatch gzip2zipslurm.sh
.
When I do, the output of the SLURM log file is
Task 1
Task 2
Task 3
Task 4
Waiting for job steps to end
The tar2zip
program reads the given tar.gz
file an re-packages it as a ZIP
file.
The Problem: Only one CPU (out of 16 available on an idle node) is doing any work. With top
I can see that all in all 5 srun
commands have been started (4 for my tasks and 1 implicit for the sbatch job, I guess) but there is only one Java process. I can also see it on the files being worked on, only one is written.
How do I manage that all 4 tasks are actually executed in parallel?
Thanks for any hints!