Recently, I executed a watch -n 1 ipconfig
on one of our Linux cluster computing nodes while it was running a 48-process MPI run, disributed over several nodes.
Oddly, while Ethernet packets seem to be counted correctly (a few kb/s due to the SSH session), it looks like the IB adapter stays idle (no change in RX/TX packets and bytes).
MPI over IB is definitely working on our cluster (we did several checks and anyway people would have noticed if not) and even more strangely, if I ping the InfiniBand-HCA from another node, suddenly packets are counted.
Admittedly my knowledge about IB is quite limited, but I know that one of the key aspects for improved performance with InfiniBand is due to the bypassing of the (kernel) network stack by implementing it directly into hardware (or so I thought - please correct me if I'm wrong!).
My explanation would be that the kernel isn't able to properly intercept the traffic due to missing information in the respective layer as the packets don't reach the kernel - does this sound reasonable? However, I'm not sure what is happening in the ICMP case then. Maybe data sent over IPoIB does trigger the respective kernel routines for packet counting while "IB-native" protocols (verbs, RDMA) do not?
Unfortunately I could not find any information on that matter in the internet.