I have some weird problem in docker swarm - even though I have replicas set to 1, I still sometimes - after deploying new version get too many containers running (like previous container isn't killed after new one is created). To get it working correctly I need to rerun stack deploy. For now I don't know how to fix this problem, so I want to create a prometheus alert when this happens. I've tried using an expression that I took straight from grafana config and don't know why it fails with error
rule 4, "too_many_containers_per_service": could not parse expression: parse error at char 72: unexpected character inside braces: '\\\\'"
Edit: There is a progress as I was able to run the prometheus container without any error, but I don't get any alerts when there is more than 1 container of a service. Not sure what is wrong.
The config:
- alert: too_many_containers_per_service
expr: sum(rate(container_last_seen{container_label_com_docker_swarm_node_id=~"node_id"}[5m])) by (container_label_com_docker_swarm_service_name) > 1
for: 2m
labels:
severity: warning
annotations:
description: Too many containers of {{ $labels.service_name }} are running simultaneously!
summary: Containers duplicate alert for service '{{ $labels.service_name }}'
UPDATE:
I was able to make it run by removing the node filter (didn't need one since I run single node swarm). My config now looks like this:
- alert: too_many_containers_per_service
expr: count(container_last_seen) by (container_label_com_docker_swarm_service_name) > 1
for: 2m
labels:
severity: warning
annotations:
description: Too many containers of '{{ $labels.container_label_com_docker_swarm_service_name }}' are running simultaneously!
summary: Containers duplicate alert for service '{{ $labels.container_label_com_docker_swarm_service_name }}'
The problem I have now is that I keep getting one alert for like "null" service.
Too many containers of '' are running simultaneously!
What is wrong with that? It never goes away.