0

I want to use prometheus to scrape metrics from my distributed web service. I have four kind of services setup with docker-compose or kubernetes.

Flask: 5000 Redis-Queue: 6379 Prometheus Workers: Horizontally scaled based on system load. They get their working instructions over the redis Queue.

It is strait forward how to scrape metrics from Flask. However, what is best-practise to get the metrics from the Workers? I cannot bind a port to them, because, I do not know, how many of them exist.

I was thinking about using a prometheus pushgateway. However, as I found out, this is not recommended.

1 Answers1

0

the answer depends whether your workers lifetime is long or short

if a worker lives to execute a single task and then quits push gateway is the correct way to send metrics from the worker to Prometheus.

if a worker lives for at least two Prometheus scrape periods (which is configurable) you can definitely open a port on the worker and have Prometheus scrape metrics from a dedicated endpoint.

Prometheus's default scrape configuration comes with a scrape job that will scrape any pod with the following annotation:

prometheus.io/scrape: true

it also derives the scrape endpoint from the following annotations on the pod

prometheus.io/scheme: http
prometheus.io/path: /metrics
prometheus.io/port: 3000

so you can easily annotate worker pods with the above annotations to direct Prometheus to scrape metrics from them

Erez Rabih
  • 15,562
  • 3
  • 47
  • 64
  • can you clarify how can we make that the worker lives more than two Prometheus scraping periods? – Juan Benitez May 20 '22 at 09:25
  • 1
    the worker is 100% in your control so it depends on the way you implement it - if you have a worker that constantly consumes messages and executes upon them it will probably leave long enough. If your worker processes a single message and goes down it might not live long enough. – Erez Rabih May 22 '22 at 09:09