We are running a small cluster environment with Intel Xeon nodes connected via Infiniband. The login node is not attached to the infiniband interconnect. All nodes run Debian Jessie.
We run Slurm 14.03.9 on the Login node. As the system OpenMPI is outdated and does not support the MPI3-Interface (which I require), I compiled a custom OpenMPI 2.0.1.
When I start MPI jobs by hand via
mpirun --hostfile hosts -np xx program_name,
it runs fine, also on multiple nodes, and takes full advantage of Infiniband. Good.
However, when I call my MPI application from inside a Slurm runscript, it crashes with strange Segfaults. I compiled OpenMPI with Slurm support, and also the PMI seems to work, so I can simply write
mpirun program_name
in the Slurm runscript, and it automatically dispatches the jobs to the correct nodes with the correct number of CPU cores. However, I keep getting these segfaults.
Explicitly specifying "-np" and "--hostfile" to mpirun in the Slurm runscript also does not help. The exactly same command which runs fine when started by hand leads to a segfault when started inside the Slurm environment.
Before the segfaults occur, I get the following error message from OpenMPI:
--------------------------------------------------------------------------
Failed to create a completion queue (CQ):
Hostname: xxxx
Requested CQE: 16384
Error: Cannot allocate memory
Check the CQE attribute.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly. This may
indicate a problem on this system.
You job will continue, but Open MPI will ignore the "ud" oob component
in this run.
Hostname: xxxx
--------------------------------------------------------------------------
I googled for it, but did not find much useful imformation. I assumed that it might be a limit on locked memory, but executing "ulimit -l" on the compute nodes returns "unlimited" as it should.
I appreciate any help to get my jobs to run with OpenMPI inside the Slurm environment.