1

I have two machines both with MS MPI 7.1 installed, one called SERVER and one called COMPUTE. The machines are set up on LAN in a simple windows workgroup (No DA), and both have an account with the same name and password.

Both are running the MSMPILaunchSvc service. Both machines can execute MPI jobs locally, verified by testing with the hostname command

SERVER> mpiexec -hosts 1 SERVER 1 hostname
SERVER
or
COMPUTE> mpiexec -hosts 1 COMPUTE 1 hostname
COMPUTE

in a terminal on the machines themselves.

I have disabled the firewall on both machines to make things easier.

My problem is I can not get MPI to run jobs from SERVER on a remote host:

1: SERVER with MSMPILaunchSvc -> COMPUTE with MSMPILaunchSvc

SERVER> mpiexec -hosts 1 COMPUTE 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 1722

Aborting: mpiexec on SERVER is unable to connect to the smpd service on COMPUTE:8677
Other MPI error, error stack:
connect failed - The RPC server is unavailable.  (errno 1722)

What's even more frustrating here is that only sometimes I get prompted to enter a password. It suggests SERVER\Maarten as the user for COMPUTE, the account I am already logged in as on SERVER and shouldn't exist on COMPUTE (should be COMPUTE\Maarten then?). Nonetheless it also fails:

SERVER>mpiexec -hosts 1 COMPUTE 1 hostname.exe -pwd
Enter Password for SERVER\Maarten:
Save Credentials[y|n]? n
ERROR: Failed to connect to SMPD Manager Instance error 1726

Aborting: mpiexec on SERVER is unable to connect to the 
smpd manager on COMPUTE:50915 error 1726

2: COMPUTE with MSMPILaunchSvc -> SERVER with MSMPILaunchSvc

COMPUTE> mpiexec -hosts 1 SERVER 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 5

Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied.  (errno 5)

3: COMPUTE with MSMPILaunchSvc -> SERVER with smpd daemon

 Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on  SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied.  (errno 5)

4: SERVER with MSMPILaunchSvc -> COMPUTE with smpd daemon

ERROR: Failed to connect to SMPD Manager Instance error 1726

Aborting: mpiexec on SERVER is unable to connect to the smpd manager on 
COMPUTE:51022 error 1726
Maarten
  • 71
  • 1
  • 6

1 Answers1

1

I found after trial and error that these and other unspecific errors come up when trying to run MS MPI with different configurations (in my case a mix of HPC Cluster 2008 and HPC Cluster 2012 with MSMPI).

The solution was to downgrade all nodes to Windows Server 2008 R2 with HPC Cluster 2008. Because I dont use AD, I had to fall back to using the SMPD daemon and add firewall rules for it (skipping the cluster management tools alltogether).

Maarten
  • 71
  • 1
  • 6