I am trying get notifed on stoppped containers by the following alert:
alert: artifactory_down
expr: absent(container_memory_usage_bytes{name="artifactory"})
for: 1m
labels:
severity: critical
annotations:
description: Artifactory container is down for more than 60 seconds.
summary: Artifactory down
Unfortunately there are gaps in the time series which result in erroneous alerts. The container is still running. The gaps are between 1 and 5 minutes.
Any idea what could cause this or how to analyse this any further?