1

I have looked into similar questions on this site (listed at the end) but still feel like missing a couple points, hopefully someone can help here:

  1. Is there a hook into the proc file system that connects the /proc/iomem inode to a function that dumps the information? I wasn't able to find where in proc fs this function lives. I did a grep under the linux source tree fs/proc for iomem, got nothing. So maybe it is a more of a procfs question... The answer to this question might help me to dig up the answer to the next question..

  2. The /proc/iomem has more entries than the BIOS E820 information I extracted from either dmesg or /sys/firmware/memmap (these two are actually consistent with each other). For example, /sys/firmware/memmap does not seem to have pci memory mapped regions. Drivers' init code calls the request_mem_region() and add more info to the map, so somewhere there should be a global variable (root of all resources ?) that remembers this graph?

The questions on stackoverflow I have looked into:

jww
  • 97,681
  • 90
  • 411
  • 885
QnA
  • 1,035
  • 10
  • 25

1 Answers1

2
  1. struct resource iomem_resource is what you're looking for, and it is defined and initialized in kernel/resource.c (via proc_create_seq_data()). In the same file, the instance struct seq_operations resource_op defines what happens when you, for example cat the file from userland.
  2. iomem_resource is a globally exported symbol, and is used throughout the kernel, drivers included, to request resources. You can find instances scattered across the kernel of devm_/request_resource() which take either iomem_resource or its sibling ioport_resource based on either fixed settings, or based on configurations. Examples of methods that take configurations are a) device trees which is prevalent in embedded settings, and b) E820 or UEFI, which can be found more on x86.

Starting with b) which was asked in the question, the file arch/x86/kernel/e820.c shows examples of how reserved memory gets inserted into /proc/iomem via insert_resource(). This excellent link has more details on the dynamics of requesting memory map details from the BIOS.

Another alternative sequence (which relies on CONFIG_OF) for how a device driver requests the needed resources is:

  1. The Open Firmware API is traversing the device tree, and finds a matching driver. For example via a struct of_device_id.
  2. The driver defines a struct platform_device which contains both the struct of_device_id and a probe function. This probing function is thus called.
  3. Inside the probe function, a call to platform_get_resource() is made which reads the reg property from the device tree. This property defines the physical memory map for a specific device.
  4. A call to devm_request_mem_region() is made (which is just a call to request_region()) to actually allocate the resources and add it to /proc/iomem.
seldak
  • 281
  • 3
  • 12
  • OF is not typical for x86, and, looking at mentioned E820, I think this the case for OP. – 0andriy Sep 17 '19 at 06:41
  • 1
    @0andriy OF was meant as an example, this is what I meant by typical. The `iomem_resource` variable -- which is the answer to OP's second question -- applies whether the driver is probed using OF or e820. Anyway I will modify the answer to be less ambiguous. – seldak Sep 17 '19 at 13:27
  • thanks guys, this almost cleared all my questions, except this piece - if E820 output does not include PCI mmap region, then that means by the time kernel takes over, pci mmap is still not setup yet? if so, then request_resource() has to have some low level code that sets the hostbridge registers so that future addresses requested by pci devices can be forwarded to pci bridge. But I did not find this code in request_resource(). – QnA Sep 18 '19 at 01:58
  • (too long, has to split into two replies) Another guess is, BIOS does reserve a huge space for all possible future pci requests, before handing control to kernel, but it just doesn't report it via E820, and kernel somehow dig it out by other means? This way at least request_resource() does not need to tweak hostbridge registers because devices are just requesting for a sub region from within a bigger region, and the bigger region is wholly mapped/forwarded to pci bridge already. Does that remotely make sense? – QnA Sep 18 '19 at 02:02
  • @QnA, I'm not sure I fully understand you as I'm not a PCI expert, but wouldn't that fall into a PCI root complex driver responsibilty? From what I understand PCI has a different address space and you may have an endpoint hotplugged at any time. I'm confident the registers for the root complex are somewhat reserved using the mentioned mechanisms and subsequently configured , but for the actual memory used depends on the driver. My untested guess is that an endpoint would initiate enumeration according to PCI protocol, which triggers an ISR, then RC would start reserving memory for endpoint. – seldak Sep 18 '19 at 02:17
  • @QnA My continued response, as I ran out of chars. The way RC would reserve memory would (I think) be using the the DMA kernel API, rather than the request region(). Anyway I think PCI deserves its own question, otherwise you may want to rename the question title to "How procfs outputs /proc/iomem for PCI?" – seldak Sep 18 '19 at 02:27
  • 1
    @QnA, PCI is **self-discovery** bus, and what you are talking about is hidden under *drivers/pci/setup-res.c*. Also for PCI-ACPI based platforms the information is propagated via MCFG table according to PCI Firmware specification. – 0andriy Sep 18 '19 at 11:11
  • thanks for the pointers. ACPI seems to be the answer for PCI memory forwarding setup at hostbridge level, Documentation/PCI/acpi-info.rst. – QnA Sep 19 '19 at 22:27