1

I want to understand how a CPU works and so I want to know how it communicates with a PCIe card.

Which instructions does the CPU use to initialize a PCIe port and than read and write to it?

For example OUT or MOV.

zomega
  • 1,538
  • 8
  • 26

1 Answers1

2

A CPU mainly communicates with PCIe cards through memory ranges they expose. This memory may be small for network or sound cards, and very large for graphics cards. Integrated GPUs have also have their own tiny memory but share most of the main memory. Most other cards also have read/write access to main memory.

To set up the PCIe device, the configuration space is written to. On x86, the BIOS or bootloader will provide the location of this data. PCI devices are connecting in a tree which may include hubs and bridges on larger computers and this can be shown in lspci -t. Thunderbolt can even connect to external devices. This is why the OS needs to recursively "probe" the tree to find PCI devices and configure them.

Synchronization uses interrupts and ring buffers. The device can send a prenegotiated interrupt to the CPU when it's done doing work. The CPU writes work to a ring buffer. It then writes another memory location that contains the head pointer. This memory location is located on the device so it can listen to writes there and wake up when there is work to do.

Most of the interaction for modern devices will use MOV instead of OUT. The I/O ports concept is very old and not very suitable for the massive amount of data on modern systems. Having devices expose their functionality as a type of memory instead of a separate mechanism allows vectorized variants of MOV to move 32 bytes or similar at a time. With graphics card and modern network cards supporting offload, they can also use their own hardware to write results back to main memory when instructed to do so. The CPU can then read the results when it's free later, again using MOV.

Before this memory access works, the OS will need to set up the memory mapping properly. The memory mapping is set in the PCI configuration space as BARs. On the CPU side it is set up in the page tables. CPUs usually have caches to keep data locally because access to RAM is slower. This causes a problem when the data needs to get to a PCI device, so the OS will set certain memory as write-through or even uncacheable so this is ensured.

The word BAR is often marketed by GPU vendors. What they are selling is the ability to map a larger region of memory at a time. Without that, OSes have been just unmapping and reinitializing by remapping a limited window of memory at a time. This exemplifies the importance of MOV accessing PCIe devices.

Daniel T
  • 827
  • 9
  • 14
  • Thank you for the answer. Can you please also tell me the following: I am quite sure there is a PCIe chip between the card and the CPU. Something like a "PCIe hub chip". What is such a chip called? Is it on the mainboard or on the CPU? Is it bundled with other things or does it have a separate chip? – zomega Nov 24 '22 at 10:25
  • 1
    On my laptop, I don't have anything except for `00:00.0 Host bridge: Intel Corporation Ice Lake-LP Processor Host Bridge/DRAM Registers (rev 03)` then the iGPU, network, SSD connected directly to it. There is a bridge for the dedicated GPU on my desktop. Are you thinking of the northbridge/southbridge? Those were merged into the processor a long time ago. – Daniel T Nov 24 '22 at 10:39
  • Here is the lspci of my computers: https://gist.github.com/danielzgtg/945be210be5be06bf0b8fdefb67ae153 . As you see, it's either nothing or processor root ports or host bridges – Daniel T Nov 24 '22 at 10:48
  • I want to know about the PCIe hub chips. Are northbridge/southbridge such? – zomega Nov 24 '22 at 10:57
  • I don't have a PCI hub, northbridge, or southbridge in my computer. The northbridge/southbridge is a PCI bridge. The external PCIe hub I Googled is probably either a bridge or a switch. – Daniel T Nov 24 '22 at 11:05
  • The other big advantage of `mov` is that MMIO stores to write-combining memory are much faster than an `out` instruction, not serializing the pipeline. Even a UC store is strongly ordered but not *as* serializing as an `in` or `out` instruction. – Peter Cordes Nov 24 '22 at 11:14
  • A PCI bridge chip works like a hub. – zomega Nov 24 '22 at 11:20
  • @somega It's called the PCIe Root Complex and is implemented on the System Agent on Intel CPUs. The PCH (which substituted the north/southbridge architecture) connects to the CPU through DMI which extends the PCIe Root Complex with more Root ports. Previously you had an Host-to-PCI bridge on the south bridge and an AGP (a secondary, faster, PCI bridge) on the north bridge). – Margaret Bloom Nov 24 '22 at 14:48