I posted below question on Spark user mailing list but as usual there is no response from community.
What is the best way to instrument metrics of Spark Application from both Driver and Executor.
I am trying to send my Spark application metrics into Kafka. I found two approaches.
Approach 1: Implement custom Source and Sink and use the Source for instrumenting from both Driver and Executor(By using SparkEnv.metricSystem).
Approach 2: Write dropwizard/gobblin KafkaReporter and use it for instrumentation from both Driver/Executor
Which one will be better approach? And how to restrict the metrics to application specific if we go with Approach 1?
I tried to go with Approach 1, but when I launch my application all the containers are getting killed.
The steps I did is as below:
As there is no KafkaSink from org.apache.spark.metrics.sink, I have implemented my custom KafkaSink and KafkaReporter as suggested in https://github.com/erikerlandson/spark-kafka-sink
Implemented SparkMetricsSource by extending
org.apache.spark.metrics.source.Source
registered the source
val sparkMetricsSource = new SparkMetricsSource("spark.xyz.app.prefix") SparkEnv.get.metricsSystem.registerSource(sparkMetricsSource)
Instrumented the metrics
sparkMetricsSource.registerGauge(sparkEnv.spark.sparkContext.applicationId, schema, "app-start", System.currentTimeMillis)
Configured the Sink through spark properties