I want to understand how a CPU works and so I want to know how it communicates with a PCIe card.
Which instructions does the CPU use to initialize a PCIe port and than read and write to it?
For example OUT or MOV.
I want to understand how a CPU works and so I want to know how it communicates with a PCIe card.
Which instructions does the CPU use to initialize a PCIe port and than read and write to it?
For example OUT or MOV.
A CPU mainly communicates with PCIe cards through memory ranges they expose. This memory may be small for network or sound cards, and very large for graphics cards. Integrated GPUs have also have their own tiny memory but share most of the main memory. Most other cards also have read/write access to main memory.
To set up the PCIe device, the configuration space is written to. On x86, the BIOS or bootloader will provide the location of this data. PCI devices are connecting in a tree which may include hubs and bridges on larger computers and this can be shown in lspci -t
. Thunderbolt can even connect to external devices. This is why the OS needs to recursively "probe" the tree to find PCI devices and configure them.
Synchronization uses interrupts and ring buffers. The device can send a prenegotiated interrupt to the CPU when it's done doing work. The CPU writes work to a ring buffer. It then writes another memory location that contains the head pointer. This memory location is located on the device so it can listen to writes there and wake up when there is work to do.
Most of the interaction for modern devices will use MOV instead of OUT. The I/O ports concept is very old and not very suitable for the massive amount of data on modern systems. Having devices expose their functionality as a type of memory instead of a separate mechanism allows vectorized variants of MOV to move 32 bytes or similar at a time. With graphics card and modern network cards supporting offload, they can also use their own hardware to write results back to main memory when instructed to do so. The CPU can then read the results when it's free later, again using MOV.
Before this memory access works, the OS will need to set up the memory mapping properly. The memory mapping is set in the PCI configuration space as BARs. On the CPU side it is set up in the page tables. CPUs usually have caches to keep data locally because access to RAM is slower. This causes a problem when the data needs to get to a PCI device, so the OS will set certain memory as write-through or even uncacheable so this is ensured.
The word BAR is often marketed by GPU vendors. What they are selling is the ability to map a larger region of memory at a time. Without that, OSes have been just unmapping and reinitializing by remapping a limited window of memory at a time. This exemplifies the importance of MOV accessing PCIe devices.