4

please help i got error message when testing alert manager below

 promtool check rules /etc/prometheus/alert.rules.yml
 Checking /etc/prometheus/alert.rules.yml
 SUCCESS: 3 rules found

 promtool test rules /etc/prometheus/alert.rules.yml
 Unit Testing:  /etc/prometheus/alert.rules.yml
 FAILED:
 yaml: unmarshal errors:
 line 1: field groups not found in type main.unitTestFile

my alert.rules configuration like below :

      cat /etc/prometheus/alert.rules.yml
      groups:
      - alert: MemoryFree10%
        expr: node_exporter:node_memory_free:memory_used_percents >= 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} hight memory usage"
          description: "{{ $labels.instance }} has more than 90% of its memory used."
      - alert: DiskSpace10%Free
        expr: node_exporter:node_filesystem_free:fs_used_percents >= 90
        labels:
          severity: moderate
        annotations:
          summary: "Instance {{ $labels.instance }} is low on disk space"
          description: "{{ $labels.instance }} has only {{ $value }}% free."
      - alert: ExporterDown
        expr: up == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Exporter down (instance {{ $labels.instance }})"
          description: "Prometheus exporter down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

Is there any missing or incorrect our file alert rules ?

please help?

Thanks

Kalana
  • 5,631
  • 7
  • 30
  • 51

3 Answers3

1

When checking syntax using promtool for config file, you've to use "./promtool check config prometheus.yml" This prometheus.yml is a parent file, which would call the prometheus rule file prometheus_rules.yml. Therefore, when checking syntax using promtool for rule file, you've to use "./promtool check rules prometheus_rules.yml"

MeowRude
  • 176
  • 1
  • 6
0

You are running the unit test on the alerting rule file. You should write test file first and then run the unit test on the test file by promtool test rules test.yml.

Here is a demo from https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/

alerts.yml

# This is the rules file.

groups:
- name: example
  rules:

  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  - alert: AnotherInstanceDown
    expr: up == 0
    for: 10m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

test.yml

# This is the main input for unit testing.
# Only this file is passed as command line argument.

rule_files:
    - alerts.yml

evaluation_interval: 1m

tests:
    # Test 1.
    - interval: 1m
      # Series data.
      input_series:
          - series: 'up{job="prometheus", instance="localhost:9090"}'
            values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
          - series: 'up{job="node_exporter", instance="localhost:9100"}'
            values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
          - series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'
            values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130
          - series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'
            values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130

      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 10m
            alertname: InstanceDown
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                      instance: localhost:9090
                      job: prometheus
                  exp_annotations:
                      summary: "Instance localhost:9090 down"
                      description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
      # Unit tests for promql expressions.
      promql_expr_test:
          # Unit test 1.
          - expr: go_goroutines > 5
            eval_time: 4m
            exp_samples:
                # Sample 1.
                - labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'
                  value: 50
                # Sample 2.
                - labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'
                  value: 50

Then you can run promtool test rules test.yml, you will get result like

Unit Testing:  test.yml
  SUCCESS
Hongbo Miao
  • 45,290
  • 60
  • 174
  • 267
-1

Your configuration is missing rules.

    groups:
    - name: alert.rules
      rules:
      - alert: HighRequestLatency
      .....

https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

Amjad Hussain Syed
  • 994
  • 2
  • 11
  • 23
  • hi amjad, thanks for reply. but i already test with using rules still error groups: - name: alerting.rules rules: - alert: ExporterDown expr: up == 0 for: 5m labels: severity: warning annotations: summary: "Exporter down (instance {{ $labels.instance }})" promtool test rules /etc/prometheus/alert.rules.yml Unit Testing: /etc/prometheus/alert.rules.yml FAILED: yaml: unmarshal errors: line 1: field groups not found in type main.unitTestFile – Fajar Hadiyanto Sep 22 '19 at 12:37
  • i have added your complete yml file and ran promtool and everything is success, are you sure your yaml is correct or formatted? ``` groups: - name: rules rules: - alert: MemoryFree10% expr: node_exporter:node_memory_free:memory_used_percents >= 90 for: 5m labels: severity: critical annotations: summary: "Instance {{ $labels.instance }} hight memory usage" description: "{{ $labels.instance }} has more than 90% of its memory used." ``` – Amjad Hussain Syed Sep 22 '19 at 12:57
  • ehmm strange i trying always error FAILED: yaml: unmarshal errors: line 1: field groups not found in type main.unitTestFile – Fajar Hadiyanto Sep 22 '19 at 14:17
  • Could you put it in a gist ? – Michael Doubez Sep 23 '19 at 15:18
  • sorry @MichaelDoubez what the meaning of gist ? – Fajar Hadiyanto Sep 23 '19 at 15:33