I wanted to setup a 3-node ring network, each connects to the other 2 using 2 Ethernet ports directly without a switch/router.
The interface configurations looks like this:
I've used ifconfig on each node to configure each port, and made sure I can ssh from each node to the other 2 nodes.
But a simple ring_c example doesn't work... So I turn on --mca btl_base_verbose 30, I could see that node1 was trying to use 23.0.0.2 (linke between node2 and 3) to get to node2 though there is a direct link to node 2.
The output log is like:
[node1:01828] btl: tcp: attempting to connect() to [[19529,1],1] address 23.0.0.2 on port 1024 [[19529,1],0][btl_tcp_endpoint.c:606:mca_btl_tcp_endpoint_start_connect] from node1 to: node2 Unable to connect to the peer 23.0.0.2 on port 4: Network is unreachable
I've read the following posts and FAQs but still couldn't understand this kind of behavior.
How does Open MPI know which IP addresses are routable to each other in Open MPI 1.3 (and beyond)?
How do I tell Open MPI which IP interfaces / networks to use?
Open MPI User's Mailing List Archives
Any pointers would be appreciated! Thanks in advance!
My open-mpi info:
Open MPI: 1.0.0.22
Open RTE: 1.0.0.22
OPAL: 1.0.0.22
MPI API: 2.1
Best, Shang