0

I am doing some monitoring on Nagios using passive alerts. I am getting some strange behavior by: passive alerts are being received by Nagios but Nagios insists that the alerts are stale.

Here is some logging; why does Nagios keep generating a SERVICE ALERT if a OK result was just received?

[1527969438] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527969851] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969860] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527970279] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527970295] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale

Here is the relevant configuration:

define service {
    use                     ldap-nprod-service-template
    hostgroup_name          ldap-aws-uat-all-hostgroup
    service_description     ldap_base
    active_checks_enabled   0          
    passive_checks_enabled  1          
    check_freshness         1          
    freshness_threshold     900        
    check_command           check_freshness_critical
}

define host {
    use         ldap-nprod-host-template
    host_name   ldap-uat-sh.example.com
    alias       ldap-uat-sh.example.com
    address     ldap-uat-sh.example.com
    check_command check_dummy_host
}

define hostgroup {
    hostgroup_name  ldap-aws-uat-all-hostgroup
    alias           LDAP AWS UAT ALL Group
    members         ldap-uat-sh.example.com
}
user35042
  • 2,681
  • 12
  • 34
  • 60

1 Answers1

0

I took out the problematic monitors from Nagios, restarted Nagios, and then added the monitors back in. This cleared the issue.

My guess is that there is a bug in the way Nagios figures out when it is flapping, and the timing of when it receives passive alerts can get it into this strange state.

user35042
  • 2,681
  • 12
  • 34
  • 60
  • I am currently facing the same issue. What do you mean by taking out problematic alerts from Nagios? Can you perhaps rephrase or explain in more detail. Thx. – rookie099 Oct 30 '19 at 06:34