0

I'm defining a data series for testing a Prometheus alert using the container_last_seen metric from the cadvisor exporter.

How do I enter timestamp series values, as returned by the container_last_seen metric? I'm testing Prometheus alerts on an Apple Mac which run in production on Linux boxes.

Here's one thing I tried:

    input_series:
      - series: |
          container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
        values: '1563968832+0x61'

It seems whatever I put in the values for the series is not accepted.

I've also tried durations: '0h+1mx60'

As this is legal: time() - container_last_seen{...} cls is definitely a timestamp, and I would expect a timestamp to be represented by a Unix epoch number. Executing the query on Prometheus gives Unix epoch times, but putting numbers in a series is rejected with the error below.

promtool is recognising the different types but giving much the same error:

➜ promtool test rules alertrules-service-oriented-test.yml
Unit Testing:  alertrules-service-oriented-test.yml
  FAILED:
1:1: parse error: unexpected number "0" in series values

If the values are '1h+0mx61', promtool correctly identifies the values as durations:

1:1: parse error: unexpected duration "1h" in series values

Note that when this test is commented out, there is no 1:1: parse error and the tests complete successfully. This is not a problem with out of sight parts of the test file.

Thanks for any insights.

Here's the alert:

alertrules.yaml:

  - name: containers
    interval: 15s
    rules:
      - alert: prod_container_crashing
        expr: |
          count by (instance, container_label_com_docker_swarm_service_name)
          (
              count_over_time(container_last_seen{container_label_com_docker_swarm_service_name!="",env="prod"}[15m])
          ) - 1 > 2
        for: 5m
        labels:
          service: prod
          type: container
          severity: critical
        annotations:
          summary: "pdce {{ $labels.container_label_com_docker_swarm_service_name }}"
          description: "{{ $labels.container_label_com_docker_swarm_service_name }} in prod cluster on {{ $labels.instance }} is crashing"

and here's the test file:

alertrules_test.yml:

rule_files:
  - alertrules.yml

evaluation_interval: 1m

tests:

  - name: container_tests
    interval: 15s
    input_series:
      - series: |
          container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
        values: '1563968832+0x61'
    alert_rule_test:
      - eval_time: 15m
        alertname: prod_container_crashing
        exp_alerts:
          - exp_labels:
              service: prod
              type: container
              severity: critical
            exp_annotations:
              summary: prod service1
              description: service1 in prod cluster on 10.0.0.1 is crashing

Nic
  • 1,518
  • 12
  • 26

1 Answers1

0

When the series: value is all on one line, without a > or | yaml flow operator, e.g.

      - series: container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
        values: '1563968832+0x61'

the error is not there, I don't know why. So this doesn't appear to be a data typing issue.

It's a shame for readability reasons-- either Prometheus or GoLang may have a squeaky wheel in their YAML implementation.

Nic
  • 1,518
  • 12
  • 26