3

I'm trying to create a Stackdriver alert policy with a Deployment Manager configuration. The same configuration first creates a resource group and a notification channel and then a policy based on those:

resources:
- name: test-group
  type: gcp-types/monitoring-v3:projects.groups
  properties:
    displayName: A test group
    filter: >-
        resource.metadata.cloud_account="aproject-id" AND
        resource.type="gce_instance" AND
        resource.metadata.tag."managed"="yes"

- name: test-email-notification
  type: gcp-types/monitoring-v3:projects.notificationChannels
  properties:
    displayName: A test email channel
    type: email
    labels:
      email_address: incidents@example.com

- name: test-alert-policy
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    enabled: true
    displayName: A test alert policy
    documentation:
      mimeType: text/markdown
      content: "Test incident"
    notificationChannels:
      - $(ref.test-email-notification.name)
    combiner: OR
    conditions:
    - conditionAbsent:
        aggregations:
        - alignmentPeriod: 60s
          perSeriesAligner: ALIGN_RATE
        duration: 300s
        filter: metric.type="compute.googleapis.com/instance/uptime" group.id="$(ref.test-group.id)"
        trigger:
          count: 1
      displayName: The instance is down

The policy's only condition has a filter based on the resource group, i.e. only the members of the group could trigger this alert.

I'm trying to use a reference to the group's ID, but it doesn't work - "The reference 'id' is invalid, reason: The field 'id' does not exists on the reference schema.

Also when I try to use $(ref.test-group.selfLink) I get The reference 'selfLink' is invalid, reason: The field 'selfLink' does not exists on the reference schema.

I could get the group's name (e.g. "projects/aproject-id/groups/3691870619975147604") but the filters only accept group IDs (e.g. only the "3691870619975147604" part):

'{"ResourceType":"gcp-types/monitoring-v3:projects.alertPolicies","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Field alert_policy.conditions[0].condition_absent.filter had an invalid value of \"metric.type=\"compute.googleapis.com/instance/uptime\" group.id=\"projects/aproject-id/groups/3691870619975147604\"\": must specify a restriction on \"resource.type\" in the filter; see \"https://cloud.google.com/monitoring/api/resources\" for a list of available resource types.","status":"INVALID_ARGUMENT","statusMessage":"Bad Request","requestPath":"https://monitoring.googleapis.com/v3/projects/aproject-id/alertPolicies","httpMethod":"POST"}}'

Milen A. Radev
  • 60,241
  • 22
  • 105
  • 110
  • 1
    Are you sure about using the group ID? The error is complaining about not having a restriction on the resource.type (must specify a restriction on "resource.type" in the filter). – Kirk Kelsey Aug 09 '19 at 17:52
  • 1
    You're right, my bad - as the [Aleksi](https://stackoverflow.com/users/1763012/aleksi)'s answer below shows, that error goes away when `resource.type="gce_instance"` is added to the condition's filter. – Milen A. Radev Aug 11 '19 at 10:43

1 Answers1

3

Try replacing your alert policy with the following:

- name: test-alert-policy
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    enabled: true
    displayName: A test alert policy
    documentation:
      mimeType: text/markdown
      content: "Test incident"
    notificationChannels:
      - $(ref.test-email-notification.name)
    combiner: OR
    conditions:
    - conditionAbsent:
        aggregations:
        - alignmentPeriod: 60s
          perSeriesAligner: ALIGN_RATE
        duration: 300s
        filter: metric.type="compute.googleapis.com/instance/uptime" $(ref.test-group.filter)
        trigger:
          count: 1
      displayName: The instance is down
  metadata:
    dependsOn:
    - test-group

This adds 1) an explicit dependency to test-group using a dependsOn clause and 2) $(ref.test-group.filter) to the metric filter so that it, while not strictly linked to test-group, ends up containing all the same resources as test-group.

As Deployment Manager resources are ran in parallel its necessary to use dependsOn to ensure test-group is instantiated before attempting to create test-alert-policy; apparently Deployment Manager isn't quite smart enough to reason this just by the references.

Aleksi
  • 4,483
  • 33
  • 45
  • A step forward - now the deployment is successful, but the created policy is still broken, i.e. the filter contains `group.id="projects/aproject-id/groups/5310387734849288536"` and it doesn't generate alerts. A working policy's condition filter contains `group.id="5310387734849288536"` for the same criteria. – Milen A. Radev Aug 11 '19 at 11:20
  • Great, getting there! Hmm... one workaround would be to set policy filter in the deployment manager config to `metric.type="compute.googleapis.com/instance/uptime" $(ref.test-group.filter)`; now the created policy, while not strictly linked to the group, ends up containing all the same resources as the group. That is, the realized policy's filter looks something like `metric.type="compute.googleapis.com/instance/uptime" resource.metadata.cloud_account="..." AND resource.type="gce_instance" AND resource.metadata.tag."managed"="yes"`. – Aleksi Aug 11 '19 at 13:05
  • 1
    This worked - the policy's filter replicates the group's one. It's not what I intended (tying policies to groups), but it achieves the same goal - "DRY". Please add this to your answer and I'll accept it. – Milen A. Radev Aug 12 '19 at 16:44