Setup
Note: Using pseudo-code instance notation:
ObjectType("<name>", | <attr>: <attr-value>])
.
We have a Container:
Container("k8s-snapshots")
in a Pod("k8s-snapshots-0")
in a `StatefulSet("k8s-snapshots", spec.replicas: 1)
We expect at most 1 Pod to run at any point in time.
We have a Logs-based Counter
Metric("k8s-snapshots/snapshot-created")
with the filter:
resource.type="container"
resource.labels.cluster_name="my-cluster"
logName="projects/my-project/logs/k8s-snapshots"
jsonPayload.event:"snapshot.created"
We have a Stackdriver Policy:
Policy(
Name: "snapshot metric absent",
Condition: Condition(
Metric("k8s-snapshots/snapshot-created"),
is absent for: "more than 30 minutes"
)
)
In order to monitor if Container("k8s-snapshots")
has stopped creating snapshots.
Expected result
An alert is triggered if no instance of Pod("k8s-snapshots-0")
has logged any event matching Metric("k8s-snapshots/snapshot-created")
.
Result
Policy(Name: "snapshot metric absent")
is violated each time Pod("k8s-snapshots-0")
is rescheduled.
It seems like a sub-metric of the main logs-based metric is created for each instance of Pod("k8s-snapshots")
, and Stackdriver alerts for each sub-metric.