0

This error appears when executing a stress test on a server and already discarded the possibility of being a HW Issue(already replace OCP and entire conections to the OCP cables, boards, etc), haven't change CPU's, RAM's, or SSD's because is not very probable that will be the cause.

device_id: 0000:64:02.0

    Dmesg check............................[FAIL]
[  250.275668] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  250.275670] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[  250.275671] {1}[Hardware Error]: event severity: corrected
[  250.275672] {1}[Hardware Error]:  Error 0, type: corrected
[  250.275673] {1}[Hardware Error]:   section_type: PCIe error
[  250.275673] {1}[Hardware Error]:   port_type: 4, root port
[  250.275674] {1}[Hardware Error]:   version: 3.0
[  250.275674] {1}[Hardware Error]:   command: 0x0547, status: 0x0010
[  250.275675] {1}[Hardware Error]:   device_id: 0000:64:02.0
[  250.275675] {1}[Hardware Error]:   slot: 6
[  250.275676] {1}[Hardware Error]:   secondary_bus: 0x65
[  250.275676] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
[  250.275677] {1}[Hardware Error]:   class_code: 060400
[  250.275677] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0013
Nathan
  • 1
  • 1
  • What problem do you have, what question are you asking? What tool is doing the "Dmesg check", and what is the criteria for failure? – John Mahowald Jan 05 '23 at 15:31

1 Answers1

1

It could be CPU related.

The errors happened on vendor_id: 0x8086, device_id: 0x347a, which is a pci:8086-347a | Intel | Core i9/Xeon PCIe Port A

Also the port type is the root port.

But the error is corrected. If nothing else is broken, and it does not happen a lot. You could just ignore it. Or try to change CPU ( or try on a different hw )

accessory
  • 111
  • 3