New to Prometheus Alerting!
I have a prometheus Counter
that has multiple Child
metrics that keeps incrementing on specific individual conditions.
On a given day, for this expression, these counters will look like:
Expression: floor(sum by (app_kubernetes_io_name, kubernetes_namespace, failure, owner_team) (increase(failure_stats_total[24h]))) > 0
This is returning the sum of individual Child metrics for the past 24 hours.
{app_kubernetes_io_name="consumer", failures="APP_FAILED", kubernetes_namespace="dev", owner_team="Team C"}
32
{app_kubernetes_io_name="consumer", failures="APP_TRANSFER_FAILED", kubernetes_namespace="dev", owner_team="Team C"}
10
{app_kubernetes_io_name="consumer", failures="DEVICE_FAILED", kubernetes_namespace="dev", owner_team="Team C"}
30
My question here is how do I fire one single slack alert every 24 hours that gives the summary of all the failures occurred and the respective counts over the past day?
I'm not sure if group_by
is the right choice here. Please advice