I'm experiencing some issues with the network when I trigger a PCI rescan on Linux with echo 1 > /sys/bus/pci/rescan
. I observe data loss, sometimes deadlocks in client/server applications or processes turning into zombie processes.
This happens on a node which consists of two Infiniband controllers and a few PCIe devices. I need to trigger a PCI rescan when one of these devices fails (in order to re-enumerate the PCIe tree and make the device be listed again):
- ditribution: centos 7.2 (same on 7.1)
- kernel: 3.10.0
- OFED: OFED-3.1-1.0.3 (same with 3.4)
- firmwares: 12.17.1010 (Mellanox MT27700 Family [ConnectX-4])
- grub boot option: pci=realloc=on
Is it possible to rescan the PCI while there is some network activity without causing issues? If not, is there a more selective way to re-enumarate just a part of the PCIe bus?