2

(redirected here from NetworkEngineering)

I have a number of Infiniband-connected servers that all report their ib0 network connection disappeared from ip and ifcfg within a few hours of each other. I tried rebooting one of the servers but no luck, it came up exactly the same way again.

ibstat and ibstatus shows the IB card is active and I can use ibping to reach the nodes without an ib0 interface, but the ib network is unusable (and can't be seen apparently). I checked lsmod for all the ib_ related entries and they looked ok.

Interestingly I found this in dmesg but unfortunately couldn't find anything online that seemed to match the issue:

Mellanox Connect-IB Infiniband driver v4.7-1.0.0
Request for unknown module key 'Mellanox Technologies signing key:  err -11
mlx5_0: ipoib_transport_dev_init failed
ib0 failed to init HW resource
mlx5_0: failed to initialize device: ib0 port 1 (ret = -12)
mlx5_0: couldn't register ipoib port 1; error -12```
user407898
  • 21
  • 2

0 Answers0