0

I have a bash script that performs various weekly data collection tasks and generates a report which is then echoed into an email to be sent. I have ran the script manually in the Linux terminal and have confirmed I can receive emails from it. The script is in the following format:

#!/bin/bash

### Code to perform data collection and generate an output text file ###

(
echo "Email greeting..."
echo "${OUTPUT}"
echo "More email stuff..."
) | mail -s "subject" "email@address"

echo "Report from ${OUTPUT} sent."

Some of the data collection tasks are quite resource intensive so I have written a batch job submission script to submit the job into a queue for HPC compute power as follows:

#!/bin/bash

#SBATCH --job-name=DATA_COLLECTION_REPORT
#SBATCH --ntasks=1
#SBATCH --time=06:00:00
#SBATCH --mem-per-cpu=250G
#SBATCH --partition=cpu

bash /PATH/TO/DATA_COLLECTION_SCRIPT.sh

Then to automate the submission, I simply used crontab to schedule the job. To illustrate, when I run crontab -l, the terminal returns the following:

0 1 * * 1 sbatch /PATH/TO/SLURM_DATA_COLLECTION_JOB_SCRIPT.sh

I can confirm the crontab job executes as I get e-mails from the cron daemon. Moreover, SLURM runs and completes my job as I have a SLURM output file which reads:

Report from output/file/path/OUTPUT_FILE.txt sent.

However I never receive the emails.

I have also attempted to run the script with crontab with a dummy report to skip the resource intensive data collection stage. I manually create the OUTPUT_FILE.txt and install a cronjob to just send the email. This works fine, so I would presume there an issue with SLURM running the email portion of the script.

Wing
  • 1
  • 1
  • It can take days until an email bounces. Do you run the smtp server locally? If yes, check it's queue and log files. If your configuration is messed up, you may not even receive the bounce emails. – Robert Aug 22 '23 at 16:42

1 Answers1

0

you can test the slurm part by adding the specific node name --nodelist=one_of_your_node_name_in_the_cpu_partition and removing the --time or reducing it to XX minutes

nisakova
  • 89
  • 6
  • Hi, I've tested the SLURM part now. I wrote a dummy bash script to simply send an email then added it to a SLURM job script and requesting only 5 minutes since it doesn't need to do any data collection. The SLURM job submits and runs, but I'm not getting any email. – Wing Aug 23 '23 at 11:22
  • Hi, do the slurm job status give you 'COMPLETED' result ? – nisakova Aug 23 '23 at 23:06
  • Hello. The SLURM job status in my HPC's job composer page cycles through queued, running, and completed. Moreover, I receive a SLURM output where the script echoes back, that the email with the test-report has been successfully sent. – Wing Aug 24 '23 at 16:03
  • Expanding on my previous test, I even submitted a small job to run the `mail` command in Linux using SLURM: `#!/bin/bash` `#SBATCH config-stuff` `echo "This is a test" | mail -s "subject" "email-address(es)"` And submitted the batch job. It completes, but once again, I do not the email. – Wing Aug 24 '23 at 16:48
  • maybe its using any port that is restricted in slum.config, did you check that ? https://slurm.schedmd.com/network.html – nisakova Aug 24 '23 at 18:14