1

[Edited]

I am trying to access an I/O port of a PCI device under Linux x86_64, however

  • inl() only ever reads 0xFFFFFFFF
  • outl() does not effect the hardware

It works under Windows (XP x86) as long as any driver (I tested with a completely empty one) is loaded for that device.

The I/O port range is different under the OSes and seems to be auto-configured by PCI bus driver.

No amount of enabling/disabling/installing/configuring other devices, buses or BIOS settings changes the port range that is assigned to the device by either OS.

The linux driver does only the following:

  • From my kernel module init() function:
    • pci_register_driver() specifying relevant PCI vendor/device IDs
  • From my pci_probe() handler function:
    • pci_enable_device()
    • pci_resource_*() which return same PCI BAR data as lspci
    • pci_request_regions()
    • inl() / outl()
  • From my kernel module exit() function
    • pci_unregister_driver()

Here is the code:

#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/io.h>
#include <linux/uaccess.h>

static void pci_release(struct pci_dev* pcidev)
{
    pci_release_regions(pcidev);
    pci_disable_device(pcidev);
    pr_err("pci_disable_device");
    return;
}

static int pci_probe(struct pci_dev* pcidev, const struct pci_device_id* pcidev_id)
{
    long result;

    result = pci_enable_device(pcidev);
    pr_err("pci_enable_device()=%ld", result);

    pr_err("res0=0x%lX", (ulong)pci_resource_flags(pcidev, 0));
    pr_err("start=0x%lX", (ulong)pci_resource_start(pcidev, 0));
    pr_err("end=0x%lX", (ulong)pci_resource_end(pcidev, 0));

    result = pci_request_regions(pcidev, "iotest");
    pr_err("pci_request_regions()=%ld", result);

    result = inl(pci_resource_start(pcidev, 0));
    pr_err("inl()=%lX", result);

    return 0;
}

static struct pci_device_id pci_ids[] =
{
    { PCI_DEVICE(0x4321, 0x9876), },
    { 0, }
};
MODULE_DEVICE_TABLE(pci, pci_ids);

static struct pci_driver pcidriver =
{
    .probe = pci_probe,
    .remove = pci_release,
    .id_table = pci_ids,
    .name = "iotest"
};

static int __init kmodule_init(void)
{
    pr_err("init");
    return pci_register_driver(&pcidriver);
}

static void __exit kmodule_exit(void)
{
    pr_err("exit");
    pci_unregister_driver(&pcidriver);
}

module_init(kmodule_init);
module_exit(kmodule_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("iotest");
MODULE_DESCRIPTION("iotest");
MODULE_VERSION("1.0");

My pci_probe() function is called, no errors are returned, but I/O behaves as if ports are not truly allocated.

[ 8809.201100] init
[ 8809.203209] pci_enable_device()=0
[ 8809.205237] res0=0x40101
[ 8809.206911] start=0xEF00
[ 8809.208574] end=0xEF3F
[ 8809.210230] pci_request_regions()=0
[ 8809.211868] inl()=FFFFFFFF
[ 8820.426361] exit

The I/O ports are reported to be the same as in lspci -n -vv output:

03:0e.0 1100: 4321:9876
    Subsystem: 4321:9876
    Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 22
    Region 0: I/O ports at ef00 [size=64]

relevant section of /proc/ioports:

0d00-ffff : PCI Bus 0000:00
  e000-efff : PCI Bus 0000:02
    e000-efff : PCI Bus 0000:03
      ef00-ef3f : 0000:03:0e.0
        ef00-ef3f : iotest

Does anyone have any ideas why can this be happening? Am I missing something?

Jack White
  • 896
  • 5
  • 7
  • `lspci -vv` confirms that the autoconfigured IO port number matches the one your driver is using? Without a [mcve], the other explanation is that maybe your code *doesn't* do what you think it does, and part of the Linux-specific code has some showstopper bug, silly or otherwise. Did you single-step it with a kernel debugger to check that your functions are really being called? – Peter Cordes Dec 30 '22 at 10:26
  • added code example. – Jack White Dec 30 '22 at 12:20
  • What is the result you are expecting? I.o.w. show us the correct values. And also can you check if the device is powered on? And one more, what's the kernel version and have you tried with the latest one (v6.1.y)? – 0andriy Jan 03 '23 at 10:07
  • 1
    The kernel version I'm using for development is old - 4.18 - because of completely unrelated reasons. Thank you very much for suggestion, I will try to use modern 6.x, no idea how this didn't occur to me earlier. However I don't expect PCI resource allocation subsystem had any significant change since then. The device is a PCI board, it is indeed powered on and logical chips do receive power normally. PCI configuration (BAR) is read correctly by Linux and Windows alike. – Jack White Jan 04 '23 at 17:12
  • I cannot access the system right now, but IIRC the correct value on 1st access should be `0xED80000C` or something like that. Surely not `0xFFFFFFFF`. At least that is what I'm getting on Windows with the driver that does nothing expect returning success immediately. Does that exact value help you somehow? I have tried reading and writing to other ports that the card provides both in Windows and Linux. On Windows I see hardware reacting. On linux I don't - as if device is not being selected (it's a CPLD without source code, I cannot probe inside to make sure of that specifically). – Jack White Jan 04 '23 at 17:27
  • My conclusion so far is the device itself is most likely at fault at least somewhat, but I'd like to get better idea of what may be wrong, and possibly create some kind of a workaround. To do that I'm asking if someone has any idea *what* exactly could even be wrong in the first place, and how to diagnose it. – Jack White Jan 04 '23 at 17:29
  • Yeah, since you mentioned CPLD (or FPGA), the bug is most likely there. The PCI specification tells that the device should be able to be relocated. You need to look into the sources of that CPLD firmware to see how the address decoder is implemented. – 0andriy Jan 05 '23 at 08:40
  • Another point is that Linux starts allocating resources from the end towards beginning of the range, while Windows does the opposite. – 0andriy Jan 05 '23 at 08:41
  • 0xffffffff may mean two things: **1.** Device is powered off (probably not your case because you see the PCI configuration space available). **2.** It does not decode the address correctly (ignoring some bits? or requiring some bits to be in the specific state?). Since it's CPLD, it's most likely that firmware was written in the form of neglecting specification and only one configuration in mind (they never perform proper QA). – 0andriy Jan 05 '23 at 08:44
  • That said, I highly recommend to communicate with the hardware vendor and get their support (in case it's purchased), otherwise I recommend to change the hardware (if possible). At last, patch the kernel when you will get it working (you may use some sniffer in Windows to see how the HW is programmed). – 0andriy Jan 05 '23 at 08:46
  • I will communicate with the developer, hopefully they have better debugging tools than I, but I'd appreciate to make sure the problem is not on my side, i.e. Linux not communicating through PCI-tp-PCI bridges correctly. It appears to be doing fine with other devices in same slot though, so I can't isolate the issue to that yet. It is quite likely CPLD has a bug in addressing, my point is trying to make sure that is the case. The hardware IS powered correctly. And it works under Windows. – Jack White Jan 05 '23 at 12:41
  • `Another point is that Linux starts allocating resources from the end towards beginning of the range, while Windows does the opposite.` - this does seem to be the case here. What kind of sniffer can I use? What sort of kernel patch can be used to diagnose this situation? – Jack White Jan 05 '23 at 12:44
  • Has the system been booted with the `pci=skip_isa_align` kernel command-line parameter (possibly within a comma-separated list of `pci=` options)? If so, try booting without that parameter. – Ian Abbott Jan 09 '23 at 13:59
  • @Ian Abbot, No but I will try that too. Also I may need to clarify: device works in Windows on the same machine in the same slot with same settings. The machine has dual boot. I suspect the PCI-PCI bridge might be at fault so will try in another machine too. – Jack White Jan 10 '23 at 02:07
  • Kernel version and suggested configuration changes have no effect. What does have an effect is moving the card to another PCI slot. Moving to adjacent slot made no difference, but moving to an opposite one did: although IO port numbers in Linux remain the same, IO ports in Windows are now same as in Linux. And it works in that slot. Will free up everything blocking all other slots and do more testing to see what changed. The slot I tested in previously is NOT defective. – Jack White Jan 11 '23 at 01:50

0 Answers0