6

Short overview: Is Alert more severe than Critical.

RFC 5424 briefly defines syslog severity levels and gives a short description. Each syslog level is given a code 0 - 7. It was my understanding that 0 (Emergency) was most severe and 7 (Debug) was least.

However I'm questioning 1 (Alert) and 2 (Critical). The definitions in RFC 5424 are:

  • Alert: action must be taken immediately
  • Critical: critical conditions

However on this site they give a longer description (which is obviously personal opinion) but define them as:

  • Alert: Should be corrected immediately - notify staff who can fix the problem - example is loss of backup ISP connection
  • Critical: Should be corrected immediately, but indicates failure in a primary system - fix CRITICAL problems before ALERT - example is loss of primary ISP connection

This seems backwards as it implies that Critical is more severe than Alert even though the RFC 5424 seems to place Alert as more severe. I was just wondering if there's an official stand on this or any best practices?

Sean Bannister
  • 751
  • 8
  • 19
  • I never saw any "official" things about that, but critical is generally used when everything is down, or will be soon. For example a hard drive failure on a RAID array is critical, when a SMART reporting for a forthcomming problem is only an alert. – Gregory MOUSSAT Feb 16 '12 at 11:43
  • Thank @GregoryMOUSSAT so from this I understand that Critical is more severe than Alert? It seems odd to me as I thought it was the other way around based on the severity codes. There's also an answer below that offers an alternative to this as well. So I'm actually still confused if there's an industry standard. Its starting to look like it depends on the developer/admin. – Sean Bannister Feb 18 '12 at 14:15

3 Answers3

3

Critical indicates that something bad is about to happen. Alert indicates that something bad already happened.

Take a look at Building Scalable Syslog Management Solutions on Cisco.com for a good read about managing syslog.

Clayton Dukes
  • 444
  • 2
  • 9
  • I disagree with this - at least in real world terms. There are many "Critical Errors" in many applications, including the Windows Event Viewer. All of these reference events that have occurred or are occuring. – Dan Feb 16 '12 at 15:41
  • Windows events do not conform to syslog standards. In true MS fashion, they completely ignored syslog and designed their own. Note that "Real world" terms is a bit erroneous as it is open to interpretation, the same goes for the actual severity of events assigned by the software authors that create them. i.e.: You may not agree with them, but they are what they are :-) – Clayton Dukes Feb 16 '12 at 15:56
  • @ClaytonDukes Thanks for the response, but I've noticed there's another conflicting response by Gregory so I'm actually still confused as to which it should be. I searched through the Cisco article but couldn't find any direct descriptions. – Sean Bannister Feb 18 '12 at 14:10
2

I think what it means by those examples is that if an Alert status is triggered, then Critical has already happened. In the example, it states that Critical is when the Primary ISP goes down, then Alert happens when the Backup ISP goes down. (So both the Primary and Backup ISP's are down). The Backup ISP going down in itself is probably not an Alert, because the Primary ISP would still be up. (Maybe a Critical). Similarly, the Primary ISP going down is only a Critical and not an alert, because the system would still be functioning albeit on the Backup ISP. (Still important to fix asap.)

Sparticuz
  • 31
  • 4
0

I think the authors of syslog inadvertently switched critical and alert. Language-wise, alert is akin to 'be advised; pay attention' ('BOLO' in crime shows is a good analogy), 'critical' is akin to 'handle this problem ASAP', and 'emergency' is akin to 'drop what you are doing and fix this NOW'.

The following hypothetical situation might better illustrate the use of Alert and Critical

  • 2013/1/1: Critical: drive 0 of md0 (RAID-1) shows excessive temperature (55C)
  • 2013/1/5: Critical: drive 0 of md0 (RAID-1) shows increasing bad sector count (34->147)
  • 2013/1/6: Critical: drive 0 of md0 (RAID-1) is failing.
  • 2013/1/6: Alert: drive 1 of md0 (RAID-1) shows excessive temperature (53C)
  • 2013/1/7: Emergency: drive 1 of md0 (RAID-1) shows increasing bad sector count (12->18)

The drive 0 problems are only critical because its mirror is OK. Drive 1's heat problem is an alert because the only drive in the RAID is having trouble; its bad sector count is an emergency because the drive the drive has two problems and is the only drive left in the array.

Alas, syslog is too entrenched now to change the order of those two labels.