0

I am trying get notifed on stoppped containers by the following alert:

alert: artifactory_down
expr: absent(container_memory_usage_bytes{name="artifactory"})
for: 1m
labels:
  severity: critical
annotations:
  description: Artifactory container is down for more than 60 seconds.
  summary: Artifactory down

Unfortunately there are gaps in the time series which result in erroneous alerts. The container is still running. The gaps are between 1 and 5 minutes.

enter image description here

Any idea what could cause this or how to analyse this any further?

Christian Schyma
  • 190
  • 1
  • 2
  • 16

1 Answers1

0

I'm guessing you're using an old version of cAdvisor, make sure you're running at least 0.27.4 for the fix I made to label consistency. Also check that scrapes of cAdvisor are suceeding via the up metric being 1.

brian-brazil
  • 31,678
  • 6
  • 93
  • 86
  • Thanks, that did it! On updating cAdvisor from 0.26.3 to 0.29.0 I also had to adopt the volume mounts as described by summera at https://github.com/google/cadvisor/issues/1843 which seems to be RHEL related. – Christian Schyma May 07 '18 at 13:14