I get a cluster of which the nodes are connected in fat tree IB. The switches are Qlogic 12300.
The problem I have is certain nodes can't talk with each other. Even there are other nodes, which can talk with both of the impacted nodes.
I used ibtracert to diag the problem. The amazing thing is if I run that command on a separate node which can talk with both the nodes, they are fine and reported a feasible route.
However the ibtracert command run into error if I issued it from the two impacted nodes.
Can I ask what the likely reason for this?
Thanks.