I am using the SLURM_TMPDIR in ComputeCanada to do some intensive I/O operations, like cloning large repositories, analyzing their commit histories, etc. But now when the job runs out of the assigned time, I lose my output file inside SLURM_TMPDIR. I read about signal trapping here. But since I am not that experienced in System programming, maybe my understanding is not very accurate and hence I can't achieve what I intend to. Here is my batch job script but it doesn't trap and copy the output to my desired location.
#!/bin/bash
#SBATCH --mem=128G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=0:10:0
#SBATCH --signal=B:SIGUSR1@120
output_file_name=file_0000.jsonl
echo "Start"
function handle_signal()
{
echo 'Moving File'
cp $SLURM_TMPDIR/<output_file_path> <my_compute_canada_directory>
exit 2
}
trap 'handle_signal' SIGUSR1
cd $SLURM_TMPDIR
git clone ...
cd ...
module purge
module load java/17.0.2
module load python/3.10
export JAVA_TOOL_OPTIONS="-Xms256m -Xmx5g"
python -m venv res_venv
source .venv/bin/activate
pip install -r requirements.txt
python data_collector.py ./data/file_0000.csv $output_file_name
wait
echo "Test"
exit 0
But it doesn't even print 'Moving File'. Can someone please guide me on how to efficiently use Signal Trap in SLURM_TMPDIR? It should copy the specified file if the job runs out of the assigned time and also should copy if my python script is done executing? Thanks!