Monitoring Spark 3 applications with Prometheus

Question

Have some very basic questions around the pull mechanism with metrics and how Spark 3 applications can be monitored using Prometheus:

Does the PrometheusServlet sink supported with Spark-3 contain all the metrics since the application start time? Are these metrics un-aggregated?
Where/How is the metric information in the sink stored and what really happens when Prometheus scrapes the end point? If the end point displays all the metric information since the application start time, wouldn't the memory spent to store these metrics be a concern for long running spark applications?
Does Prometheus fetch all the metrics (since the application start time) on every re-scrape? If not, how does it know the metric last scraped?

Thanks.

Arnon Rotem-Gal-Oz · Answer 1 · 2022-10-30T07:27:14.900

You can just set it up and see for yourself.. :)

Anyway, the way each metric behaves depends on the metric (weather it is a cumulative one or snapshot as described in https://spark.apache.org/docs/latest/monitoring.html (each metric has a label which states its type)

past values of the metrics are not stored and it is up to Prometheus to fetch it periodically (which is what it does anyway as it is pull based). The servlet just formats the metrics in a prometheus compatible way

Monitoring Spark 3 applications with Prometheus

1 Answers1