1

I am trying to scrape traefik metrics from prometheus.

Traefik (latest) is hosted as a service on a swarm cluster, and the prometheus metrics are activated. The matching endpoint is 10.200.1.1:8088/metrics

When I reach my endpoint from the navigator, I see the expected metrics :

...
# HELP traefik_config_last_reload_failure Last config reload failure
# TYPE traefik_config_last_reload_failure gauge
traefik_config_last_reload_failure 0
# HELP traefik_config_last_reload_success Last config reload success
# TYPE traefik_config_last_reload_success gauge
traefik_config_last_reload_success 1.53633684e+09
# HELP traefik_config_reloads_failure_total Config failure reloads
# TYPE traefik_config_reloads_failure_total counter
traefik_config_reloads_failure_total 0
# HELP traefik_config_reloads_total Config reloads
# TYPE traefik_config_reloads_total counter
traefik_config_reloads_total 76
...

So, to my pov, editing the following prometheus.yml (and POSTing to the /-/reload) should add these metrics.

global:
  scrape_interval:     15s

rule_files:
  - "targets.rules"
  - "host.rules"
  - "containers.rules"

scrape_configs:

...

  - job_name: 'traefik'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['10.200.1.2:8088']

But unfortunately, none of those appear on prometheus api's drop down list.

Since I am new to traefik and prometheus, I am quite sure I understood something wrong. I tried to follow a few guides (such as this one), but could not manage to have it work (may have worked with the previous version).

So.... does anyone have an idea on what I do wrong and/or what is the correct way?

Marvin
  • 1,650
  • 4
  • 19
  • 41
  • 1
    Can you please have a look in the Prometheus UI under Status -> Targets? Do you see a job named „traefik“ there? Any errors? The configuration is loaded properly when you check under Status -> configuration? – Andreas Jägle Sep 07 '18 at 22:56
  • @AndreasJägle Yes, I see traefik "target", and the matching endpoint is ok, as well as the last scrap (no error, and recent). The configuration is also updated correctly. But no matter everything looks fine, I don't see any of my metrics in the drop-down list. May it be something relateg to traefik's metrics formating? – Marvin Sep 10 '18 at 07:15
  • 1
    Leaving aside Prometheus API's dropdown list, what do you get when you query for `{job="traefik"}`? You should at the very least get a result of `up{...job="traefik"...} 1` result, but it would be weird if that was all you got. – Alin Sînpălean Sep 10 '18 at 12:15
  • @AlinSînpălean Works perfectly!... yes, that odd... but your questions eventually lead me to the source of the problem : one of the swarm manager was corrupted and was randomly accessible. Switching my manager to another node fixed everything! – Marvin Sep 10 '18 at 14:26

1 Answers1

0

After a while, many attempts and some pertinent questions later : I ended up thinking it was not about my configuration... So since I also observed some randomly odd behavior (such as some 503 errors on my remote /providers call), I started thinking the problem was related to the access to my machine.

So I tried to demote the manager and promote another node of the swarm instead. ... And it worked! My traefik metrics now appear in prometheus!

I still have to understand what is wrong with my former manager, but at least, I am stepping forward!

Thanks @AlinSînpălean & @AndreasJägle for your help!

Marvin
  • 1,650
  • 4
  • 19
  • 41