0

I have set up the PrometheusPushGatewayReporter as explained in the documentation metrics section.

I can see the metrics from the flink jobmanager and the taskmanagers exposed in the push gateway's UI, as well as that they are properly scraped by the Prometheus Cluster.

The issue is that even though I have explicitly set the deleteOnJobShutdown config option, only the jobmanager's metrics are deleted when the job is cancelled through the flink cli tool.

Is there a way to also delete the stale taskmanager metrics? My configuration is as follows :

metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: $PUSH_GATEWAY_HOST
metrics.reporter.promgateway.port: 80
metrics.reporter.promgateway.jobName: foo
metrics.reporter.promgateway.randomJobNameSuffix: true
metrics.reporter.promgateway.deleteOnShutdown: true
metrics.reporter.promgateway.interval: 60 SECONDS

I am using Flink 1.7.1, on Hadoop 2.6.0

Michael Doubez
  • 5,937
  • 25
  • 39
Spyros Mandekis
  • 984
  • 1
  • 14
  • 32

1 Answers1

1

.

In our product env, we also met the same problem. If pushgateway can implements TTL for pushed metrics[1], it'll very useful. But for now, we use a external schedule system to check whether the flink job is alive or not, then delete metrics by pushgateway's rest api[2].

[1]https://github.com/prometheus/pushgateway/issues/19

[2]https://github.com/prometheus/pushgateway#delete-method

lamber-ken
  • 11
  • 1
  • Thanks for the info! We've ended up using [this](https://github.com/dinumathai/pushgateway) fork of the push gateway that adds support for TTL in metrics – Spyros Mandekis Aug 22 '19 at 13:03