0

After reboot of a SuSE 12 host I looked into dmesg and came across the info

    GHES: HEST is not enabled!

GHES obviously means "APEI Generic Hardware Error Source support". Should GHES/HEST be enabled? What about the benefit of it? How to achieve this, if it is advisable?

MarkHelms
  • 181
  • 5
  • 16

1 Answers1

3

I have several SLES machines in production and never used this. Here's a good overview: https://firmware.intel.com/sites/default/files/resources/A_Tour_beyond_BIOS_Implementing_APEI_with_UEFI_White_Paper.pdf

Hardware Error Source Table (HEST) The HEST table enables host firmware to declare all errors that platform component can generate and error signaling for those. The host firmware shall create Error source entries in HEST for each component (such as, processor, PCIe device, PCIe bridge, etc) and each type of error with corresponding error notification mechanism (singling) to OS. These error entries include x86 architectural errors, industry standard errors and generic hardware error source for platform errors. The x86 architectural errors, MCE and CMC, and standard errors PCIe AER, MSI and PCI INTx can be handled by OS natively. The generic hardware error source can be used for all firmware 1st errors and platform errors (such as memory, board logic) that do not have OS native signaling, so they have to use platform signaling SCI or NMI.

I guess if you really want to monitor all the hardware errors this might be useful.

Tux_DEV_NULL
  • 1,093
  • 7
  • 11
  • Very useful definition, which makes things quite clear. – MarkHelms Aug 17 '17 at 09:46
  • Link is broken, [fixed link](https://software.intel.com/content/www/us/en/develop/download/a-tour-beyond-bios-implementing-the-acpi-platform-error-interface-with-the-uefi.html) – idanp Jun 08 '20 at 06:57