Nagios receives OK passive alerts but still reports "passive check is stale"

Question

I am doing some monitoring on Nagios using passive alerts. I am getting some strange behavior by: passive alerts are being received by Nagios but Nagios insists that the alerts are stale.

Here is some logging; why does Nagios keep generating a SERVICE ALERT if a OK result was just received?

[1527969438] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527969851] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969860] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527970279] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527970295] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale

Here is the relevant configuration:

define service {
    use                     ldap-nprod-service-template
    hostgroup_name          ldap-aws-uat-all-hostgroup
    service_description     ldap_base
    active_checks_enabled   0          
    passive_checks_enabled  1          
    check_freshness         1          
    freshness_threshold     900        
    check_command           check_freshness_critical
}

define host {
    use         ldap-nprod-host-template
    host_name   ldap-uat-sh.example.com
    alias       ldap-uat-sh.example.com
    address     ldap-uat-sh.example.com
    check_command check_dummy_host
}

define hostgroup {
    hostgroup_name  ldap-aws-uat-all-hostgroup
    alias           LDAP AWS UAT ALL Group
    members         ldap-uat-sh.example.com
}

user35042 · Answer 1 · 2019-10-30T23:15:51.420

0

I took out the problematic monitors from Nagios, restarted Nagios, and then added the monitors back in. This cleared the issue.

My guess is that there is a bug in the way Nagios figures out when it is flapping, and the timing of when it receives passive alerts can get it into this strange state.

edited Oct 30 '19 at 23:15

answered Jun 10 '18 at 11:16

user35042

2,681
12
34
60

I am currently facing the same issue. What do you mean by taking out problematic alerts from Nagios? Can you perhaps rephrase or explain in more detail. Thx. – rookie099 Oct 30 '19 at 06:34

Nagios receives OK passive alerts but still reports "passive check is stale"

1 Answers1