Implementing Spark metric source and Sink for custom application metrics

Question

I posted below question on Spark user mailing list but as usual there is no response from community.

What is the best way to instrument metrics of Spark Application from both Driver and Executor.

I am trying to send my Spark application metrics into Kafka. I found two approaches.

Approach 1: Implement custom Source and Sink and use the Source for instrumenting from both Driver and Executor(By using SparkEnv.metricSystem).

Approach 2: Write dropwizard/gobblin KafkaReporter and use it for instrumentation from both Driver/Executor

Which one will be better approach? And how to restrict the metrics to application specific if we go with Approach 1?

I tried to go with Approach 1, but when I launch my application all the containers are getting killed.

The steps I did is as below:

As there is no KafkaSink from org.apache.spark.metrics.sink, I have implemented my custom KafkaSink and KafkaReporter as suggested in https://github.com/erikerlandson/spark-kafka-sink
Implemented SparkMetricsSource by extending org.apache.spark.metrics.source.Source

registered the source

val sparkMetricsSource = new SparkMetricsSource("spark.xyz.app.prefix")
SparkEnv.get.metricsSystem.registerSource(sparkMetricsSource)

Instrumented the metrics

sparkMetricsSource.registerGauge(sparkEnv.spark.sparkContext.applicationId, schema, "app-start", System.currentTimeMillis)

I am starting to believe that there is no way to do it, even in Spark 3. I also cannot find the answer for my question as well... https://stackoverflow.com/questions/63012890/how-to-create-a-source-to-export-metrics-from-spark-to-another-sink-prometheus — Felipe, Jul 21 '20 at 13:24

0 Answers0