1

I want to be able to specify all my rules for, say prometheus-blackbox-exporter so have added this to a rules-mine.yaml and deployed with

helm upgrade --install -n monitoring blackbox -f values.yaml -f rules-mine.yaml .

I cannot see any rules listed in http://localhost:9090/rules and nothing seems to be evaluated as no alerts.... I need to do everything as IaC and deploy through terraform in an automated fashion.

  • Is it possible to add rules to exporters in this manner?
  • If so, then can anyone see a problem with the file below?
  • If not, how can I add rules to many exporters efficiently?

The rules-mine.yaml file contains:

prometheusRule:
  enabled:  true
  namespace: monitoring
  additionalLabels:
    team: foxtrot_blackbox
    environment: production
    cluster: cluster
    namespace: namespace_x
  namespace: "monitoring"

  rules:
  - alert: BlackboxProbeFailed
    expr: probe_success == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox probe failed (instance {{`{{`}} $labels.instance {{`}}`}})
      description: "Probe failed\n  VALUE = {{`{{`}} $value {{`}}`}}"

  - alert: BlackboxSlowProbe
    expr: avg_over_time(probe_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox slow probe (instance {{`{{`}} $labels.instance {{`}}`}})
      description: "Blackbox probe took more than 1s to complete\n  VALUE = {{`{{`}} $value {{`}}`}}"

Thanks for your help....

Andrew
  • 81
  • 1
  • 9

3 Answers3

3

The best way I found seemed to be adding the exporter rules to the kube-prometheus-stack values.yaml file (I actually created a separate rules.yaml file) and fed it to helm:

  • helm upgrade --install -n monitoring prometheus --create-namespace -f values-mine.yaml -f rules-mine.yaml prometheus-community/kube-prometheus-stack

All the rules are then picked up as I wanted and seems to be an OK solution. But I would still prefer them grouped with the exporter - if I find a solution for that, I'll post again.

additionalPrometheusRulesMap:
  prometheus.rules:
    groups:
    - name: company.prometheus.rules
      rules:
      - alert: PrometheusNotificationsBacklog
        expr: min_over_time(prometheus_notifications_queue_length[10m]) > 0
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Prometheus notifications backlog (instance {{ $labels.instance }})
          description: The Prometheus notification queue has not been empty for 10 minutes\nVALUE = {{ $value }}
          dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
          runbook_url: ${wiki_url}/{{ $labels.alertname }}

  company.blackbox.rules:
    groups:
    - name: company.blackbox.rules
      rules:
      - alert: BlackboxProbeFailed
        expr: probe_success == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: Blackbox probe failed (instance {{ $labels.instance }})
          description: Probe failed\nVALUE = {{ $value }}
          dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
          runbook_url: ${wiki_url}/{{ $labels.alertname }}

      - alert: BlackboxSlowProbe
        expr: avg_over_time(probe_duration_seconds[1m]) > 1
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: Blackbox slow probe (instance {{ $labels.instance }})
          description: "Blackbox probe took more than 1s to complete\nVALUE = {{ $value }}"
          dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
          runbook_url: ${wiki_url}/{{ $labels.alertname }}

# etc....
Andrew
  • 81
  • 1
  • 9
  • I'm trying to do something similar and my custom rules.yaml has the same structure but it's not being picked up when I look at the dashboard for prometheus even though a PrometheusRule resource was created for the additional rules. DId you manage to solve it? – everspader Apr 13 '22 at 13:53
1

A colleague found that this is entirely possible. It seemed to have something to do with the quoting that was used in the original implementation. The following is now in use and working so posting here in the hope it will be useful for others.

In summary,

  • {{`{{`}} $labels.instance {{`}}`}} == BAD
  • {{`{{$labels.instance}}`}} == GOOD
prometheusRule:
  enabled: true
  additionalLabels:
    client: ${client_id}
    cluster: ${cluster}
    environment: ${environment}
    grafana: ${grafana_url}

  rules:
    - alert: BlackboxProbeFailed
      expr: probe_success == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: Blackbox probe failed for {{`{{$labels.instance}}`}}
        description: Probe failed VALUE = {{`{{$value}}`}}
        dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
        runbook_url: ${wiki_url}/BlackboxProbeFailed

    - alert: BlackboxSlowProbe
      expr: avg_over_time(probe_duration_seconds[1m]) > 1
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Blackbox slow probe for {{`{{$labels.instance}}`}}
        description: Blackbox probe took more than 1s to complete VALUE = {{`{{$value|humanizeDuration}}`}}
        dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
        runbook_url: ${wiki_url}/BlackboxSlowProbe

Please ignore any missing variables, etc.

Andrew
  • 81
  • 1
  • 9
0

Are you sure you haven't made a typo in the label name : "environmment" ? that will sure not match what you are expecting, unless you actually labelled your source that.

best

cuyan
  • 1
  • 1