1

I get the following error when trying to submit a job with sbatch:

An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).

When I use sbatch with no parameters it runs fine, but when I try to pass any parameter (e.g. --job-name or --export) with sbatch, the above error appears.

I am using openmpi 3 and running a python script with mpirun. Both mpirun and orted appear to be using the same openmpi version, as evidenced by calling which in my slurm script right before using mpirun:

which mpirun: /opt/openmpi30/bin/mpirun
which orted: /opt/openmpi30/bin/orted

Any help would be greatly appreciated.

0 Answers0