2

I want to create an alert rule when a pod has restarted. i.e. if the pod restarts twice in a 30 min window

I have the following log analytics query:

KubePodInventory
| where ServiceName == "xxxx"
| project PodRestartCount, TimeGenerated, ServiceName
| summarize AggregatedValue = count(PodRestartCount) by ServiceName, bin(TimeGenerated, 30m) 

But setting the alert threshold to 2 in this case won't work since the PodRestartCount is not reset. Any help would be greatly appreciated. Maybe there is a better approach which I'm missing.

user2342643
  • 35
  • 2
  • 6
  • are you basing your alerts on kube-state-metrics metrics ? – djsly May 12 '20 at 00:27
  • here's an example using kube-state-metrics' data. https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus/rules-1.14/kubernetes-apps.yaml#L28-L35 – djsly May 12 '20 at 00:33
  • I tried your query, shouldn't you be using a rate instead of `count` ? You are right that the PodRestartCount is a count, therefore, it will increment until the pod changes ID – djsly May 12 '20 at 00:52

1 Answers1

3

To reset the count between BIN() you can use the prev() function on a serialized output to compute the diff

KubePodInventory
| where ServiceName == "<service name>" 
| where Namespace == "<namespace name>"
| summarize AggregatedPodRestarts = sum(PodRestartCount) by bin(TimeGenerated, 30m) 
| serialize
| extend prevPodRestarts = prev(AggregatedPodRestarts,1)
| extend diff = AggregatedPodRestarts - prevPodRestarts
| where diff >= 2

this will output you the right diff over your BIN period.

TimeGenerated [UTC]         prevPodRestarts diff        AggregatedPodRestarts
5/12/2020, 12:00:00.000 AM  1,368,477       191,364     1,559,841   
5/11/2020, 11:00:00.000 PM  1,552,614       3,594       1,556,208   
5/11/2020, 10:00:00.000 PM  182,217         1,370,397   1,552,614

ref: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/serializeoperator

https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/prevfunction

djsly
  • 1,522
  • 11
  • 13