0

I wrote a Spark application which I compile with maven and use spark-submit to run it. I wanted to monitor my application and collect metrics. Therefore, I used a Prometheus container, but I'm struggling with exposing a simple metric to it. I tried to follow the answer here. But I didn't understand what should I do with the spark.yml file.

  • I have a Prometheus client that counts some stuff.
  • I uncomment *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink in spark/conf/metrics.properties
  • I added JMX Prometheus Javaagent to my pom.xml

This is my prometheus.yml:

    global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
- job_name: prometheus
  static_configs:
  - targets: ['localhost:9090']

- job_name: spark-master
  static_configs:
  - targets: ['spark-master:8082']

When I look at the targets in http://localhost:9090/targets I can see that Prometheus target is up and Spark is down

Oded
  • 336
  • 1
  • 3
  • 17

1 Answers1

1

I think the answer depends upon what you want to monitor in Spark 2.1.

  1. If it is JVM metrics - I don't think you can do that. For the simple reason that you donot know where the JVMs will be created in the Spark cluster. If we knew that it would be impossible to launch multiple JVMs in the same node because each JMX agent would need a port to be assigned dynamically and Prometheus server needs an exact scraping url which would be impossible.

  2. If the requirement is to measure business specific metrics using push gateway then yes you can do that because Prometheus server would be scraping a specific scraping url.

Maybe you need to look at a more recent version of Spark3.0 which supports Prometheus. Please follow this link - https://spark.apache.org/docs/latest/monitoring.html

floating_hammer
  • 409
  • 3
  • 10
  • Hi @floating_hammer, unfortunately, I can't use Spark 3.x . My goal is to monitor custom metrics from my running Spark application. (e.g. count how many times one of the classes has been generated). How can I achieve that? – Oded Jun 10 '21 at 14:10
  • @Oded - If you are using custom metrics(as you mentioned). You can use prometheus pushgateway. You would need to publish the metric to pushgateway and Prometheus server would scrape it from there. – floating_hammer Jun 10 '21 at 14:14
  • Up until now, I have created in my source code a Counter with io.prometheus.client. I used JMX-exporter and my Prometheus container failed to scrape from it. Where do pushgateway enters in that flow? what is it doing? – Oded Jun 10 '21 at 14:23
  • @Oded - Prometheus pushgateway is used for publishing custom/application specific metrics to Prometheus. You can find more information on this link https://prometheus.io/docs/practices/pushing/ and on Github - https://github.com/prometheus/pushgateway. So the flow is Application publishes to Pushgateway. Prometheus Server scrapes from Pushgateway. – floating_hammer Jun 10 '21 at 15:46