3

I posted below question on Spark user mailing list but as usual there is no response from community.

What is the best way to instrument metrics of Spark Application from both Driver and Executor.

I am trying to send my Spark application metrics into Kafka. I found two approaches.

Approach 1: Implement custom Source and Sink and use the Source for instrumenting from both Driver and Executor(By using SparkEnv.metricSystem).

Approach 2: Write dropwizard/gobblin KafkaReporter and use it for instrumentation from both Driver/Executor

Which one will be better approach? And how to restrict the metrics to application specific if we go with Approach 1?

I tried to go with Approach 1, but when I launch my application all the containers are getting killed.

The steps I did is as below:

  1. As there is no KafkaSink from org.apache.spark.metrics.sink, I have implemented my custom KafkaSink and KafkaReporter as suggested in https://github.com/erikerlandson/spark-kafka-sink

  2. Implemented SparkMetricsSource by extending org.apache.spark.metrics.source.Source

  3. registered the source

    val sparkMetricsSource = new SparkMetricsSource("spark.xyz.app.prefix")
    SparkEnv.get.metricsSystem.registerSource(sparkMetricsSource)
    
  4. Instrumented the metrics

    sparkMetricsSource.registerGauge(sparkEnv.spark.sparkContext.applicationId, schema, "app-start", System.currentTimeMillis)
    
  5. Configured the Sink through spark properties

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
Anil Kumar
  • 31
  • 3
  • 1
    I am starting to believe that there is no way to do it, even in Spark 3. I also cannot find the answer for my question as well... https://stackoverflow.com/questions/63012890/how-to-create-a-source-to-export-metrics-from-spark-to-another-sink-prometheus – Felipe Jul 21 '20 at 13:24

0 Answers0