I am trying to run simple openmpi test on two servers.
mpirun --report-bindings --host serv1.cell,serv2.cell -np 2 hostname
Both servers runs OpenSuse 13.2 and have similar network inteface configuration:
ens2f0 - internet connection, External firewall zone
ens2f1 - lan connection (192.168.0.0), Internal firewall zone
ens2f2 - bonding slave, Internal firewall zone
ens2f3 - bonding slave, Internal firewall zone
bond0 - bonding inteface (192.168.6.0), different subnet than ensf1, Internal firewall zone
serv1.cell and serv2.cell are defined in /etc/hosts as adresses in the bonding network (192.168.6.0)
Openmpi was installed from default repos using zypper.
If both firewall are off - everything is fine, but when one of them is running, strange things happens.
If I turn off firewall on serv1, and runs it on serv2, openmpi works on serv1:
serv1.cell:~ # mpirun --report-bindings --host serv1.cell,serv2.cell -np 2 hostname
serv2.cell
serv1.cell
And does not work on serv2:
serv2.cell:~ # mpirun --report-bindings --host serv1.cell,serv2.cell -np 2 hostname
If I turn off firewall on serv2, and run it on on serv1 it goes the other way around: serv2 works fine, but serv1 stucks.
I also tried a simple test using netcat: both firewall are on, netcat listen on serv1, connection and data from serv2 is ok, and vice versa, so the firewalls allows anything though bond0. It is not a solution to turn firewalls off, so how I should configure OpenMPI (or firewall) to make both servers work properly?