3

My machine (running Linux kernel 3.2.38) on boot has wrong subsystem IDs (sub-device and sub-vendor IDs) of a PCI device. If I then physically unplug and re-plug the PCI device while the system is still up (i.e., hot-plug), it gets the correct IDs.

Note that the wrong sub-device and sub-vendor IDs it gets are same as the device's device and vendor IDs (see the first two lines in the lspci output below).

Following is the output of lspci -vvnn before and after hot-plugging the device:

Before hot-plugging:

0b:0f.0 Bridge [0680]: Device [1a88:4d45] (rev 05)
Subsystem: Device [1a88:4d45]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32 (250ns min, 63750ns max)
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 2100 [size=256]
Region 1: I/O ports at 2000 [size=256]
Region 2: Memory at 92920000 (32-bit, non-prefetchable) [size=64]

After hot-plugging:

0b:0f.0 Bridge [0680]: Device [1a88:4d45] (rev 05)
Subsystem: Device [007d:5a14]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 2100 [disabled] [size=256]
Region 1: I/O ports at 2000 [disabled] [size=256]
Region 2: [virtual] Memory at 92920000 (32-bit, non-prefetchable) [size=64]

My question: Is there a way to get the IDs fixed without hot-plugging the device? e.g. forcing kernel to re-read PCI device IDs e.g. by performing a PCI bus rescan/re-enumeration/re-configuration?

Any help would be highly appreciated. Thanks.

PS. Note that the problem isn't really related to kernel/software as it exists even if boot into UEFI internal shell.

PPS. The PCI device in this case is MEN F206N and "My machine" is MEN F22P

Jahanzeb Farooq
  • 1,948
  • 4
  • 27
  • 27
  • Do you know what happens when the system boots with the device plugged in? Does it have the correct vendor id at that point? – vivekian2 Apr 15 '14 at 17:21
  • Yes on boot it sees the wrong IDs. Removing and re-adding the device while system is still up (i.e. hot-plugging) fixes the IDs. – Jahanzeb Farooq Apr 15 '14 at 17:30
  • 1
    I would suggest that you look at the source code for the MEN F206N driver and see what it does in the _init (bootup) and _devinit (hotplug) hooks. It would be good to see if there is a difference in the implementation. – vivekian2 Apr 15 '14 at 17:35
  • Thanks. Yes I might need to try that out. Though my preference is to find a fix that doesn't involve making changes to the code. – Jahanzeb Farooq Apr 15 '14 at 18:08
  • If you are just looking to get this working somehow in a bash script etc. you could as well just WAR it by recognizing both the subsystem vendor ids as belonging to the same device. Ofcourse, its hard to say without knowing your complete use case. – vivekian2 Apr 15 '14 at 18:12
  • The problem is I need to use two of those PCI devices for two different purposes and our software uses subsystem IDs to identify the devices. With two devices with same subsystem IDs it cannot distinguish which one is which. – Jahanzeb Farooq Apr 15 '14 at 18:48
  • There are some solutions on the web, which suggest doing this: echo "1" > /sys/bus/pci/rescan. You can give it a try, though I believe you need to have CONFIG_HOTPLUG to be on when compiling the kernel. – vivekian2 Apr 15 '14 at 19:14
  • Thanks. I had tried that (and everything else I could google) before posting it here. I managed to remove the device with "echo 1 > /sys/bus/pci/devices/*/remove" and rescan/re-add it back with "echo 1 > /sys/bus/pci/rescan" but still with wrong IDs. – Jahanzeb Farooq Apr 15 '14 at 19:19
  • 1
    You should definitely try to get in touch with the MEN customer engineers or report this as a bug. Also, just on a side note, do try this out on 2 different x86 systems to make sure that the issue does indeed replicate. – vivekian2 Apr 15 '14 at 20:38
  • 1
    I have tried it on three different systems. Same result :-) And yes I have already contacted MEN people too, they said they will look into it. But I am not expecting a fast response from them. – Jahanzeb Farooq Apr 15 '14 at 20:41
  • 1
    If the IDs are wrong at boot time, that points to a problem with the F206N card. I see it is an FPGA-based board. This smells like an FPGA loading/configuration issue. The hotplug/rescan would simply a bandaid on the "real" issue, and I would be hesitant to ship a production system with it. The vendor should be willing to work with you to fix it. – myron-semack May 01 '14 at 14:56

2 Answers2

4

You may forcefully rescan the PCI by :

# echo 1 > /sys/bus/pci/rescan

raghav3276
  • 1,088
  • 8
  • 14
1

A closer look at your lscpi output before and after hot plugging the device shows more delta than just the sub device/vendor ID. I'd be surprised if the device functions as expected after hot plugging.

Besides, forcing PCI reenumeration is not possible primarily because there may be other devices that have been enumerated correctly and functioning already. How do you expect reenumeration to deal with that? (and there are other reasons too.)

Prafulla

Prafulla
  • 11
  • 2
  • The device functions as expected after hot-plugging. It's just a guess that it re-enumerates the device when I hot-plug it. So idea was to somehow force it to do that without removing the device, so that I could have an easy workaround in the bash script. Something definitely happens when the device is hot-plugged which causes it to fix the IDs. If it is not re-enumeration, it's got to be something else, but I have no idea what. – Jahanzeb Farooq Apr 18 '14 at 13:51