Lets say a pci express device implements an AER capability for robust error reporting. So whenever such error is intercepted by the device, it populates its AER register accordingly. How this error will be reported to root complex? I read in spec that such errors will be reported via “message TLP”. Does this “message TLP” mean that an MSI will be triggered to send an indication to the root complex ? Also is the same mechanism used for reporing other/non AER errors?
1 Answers
PCI-Express errors are reported with a specific type of TLP (Transaction Layer Packet) called Message Request (abbreviated Msg). A Msg is a sort of general-purpose TLP that is used for several purposes (PCI compatibility interrupt signaling, power management, hot plug signaling and more, in addition to error signaling), and is differentiated from other TLP types (Memory Read request, Memory Write request, Completion, Configuration read/write, and so forth).
An error Msg TLP is a message from the device that detected the error specifying the class of error that was detected and in which device it was detected. There are a number of different ways that Msg TLP routing can be specified by a sending device. One of those ways is "Route to root complex", which directs any intermediate component such as a switch (bridge) to forward the TLP upstream to the root complex. Error details are logged in the specific device's AER capability (if it implements one -- most PCI-E devices do), and then a Msg signaling the error is sent upstream to the root complex.
Once at the root complex, there are two ways of reporting the error to the platform.
- It can be reported as a "system error" (this is the PCI- and PCI-X compatible mechanism) if enabled in the root complex's Root Control Register in the PCI Express Capability. What exactly is done with a system error is platform-specific. On typical x86 machines, an uncorrectable error results in a Machine Check Exception. (There is often a mechanism on the motherboard to record what happened in an event log that may be accessible to software afterward. See also this link for more info: https://askubuntu.com/a/608156/470836)
- The root complex also can generate an interrupt via MSI if so enabled in the root complex's Root Command Register (part of the root's AER capability). I don't think this is often used.
There isn't really a different class of errors that are "non-AER errors". However, a device doesn't have to implement an AER capability. In that case, "Devices that do not support the Advanced Error Reporting capability log only the Device Status register bits indicating that an error has been detected." So details of the error would be lost.

- 11,973
- 28
- 51