9

I've written my custom metrics source/sink for my Spark streaming app and I am trying to initialize it from metrics.properties - but that doesn't work from executors. I don't have control on the machines in Spark cluster, so I can't copy properties file in $SPARK_HOME/conf/ in the cluster. I have it in the fat jar where my app lives, but by the time my fat jar is downloaded on worker nodes in cluster, executors are already started and their Metrics system is already initialized - thus not picking my file with custom source configuration in it.

Following this post, I've specified 'spark.files = metrics.properties' and 'spark.metrics.conf=metrics.properties' but by the time 'metrics.properties' is shipped to executors, their metric system is already initialized.

If I initialize my own metrics system, it's picking up my file but then I'm missing master/executor level metrics/properties (eg. executor.sink.mySink.propName=myProp - can't read 'propName' from 'mySink') since they are initialized by Spark's metric system.

Is there a (programmatic) way to have 'metrics.properties' shipped before executors initialize their metrics system?

Update1: I am trying this on stand-alone Spark 2.0.0 cluster

Update2: Thought of hacks on achieving this - before starting your 'actual' spark job, start a dummy job to copy metrics.properties on each worker. Then start your actual job with pre-known file location. Cons - if a worker dies and another worker takes it's place, it won't have this file in pre-known path. Solution alternative - when a new worker machine starts, it pulls metrics.properties from your git-repo too and places it in pre-known path. Although, it may work, it's terribly hacky and a preferred solution is for Spark to support it internally.

K P
  • 861
  • 1
  • 8
  • 25

2 Answers2

1

see Spark metrics on wordcount example Basically I believe you need to add --files to send the metrics.properties to all workers

Community
  • 1
  • 1
Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56
  • I've tried that. 2 problems - 1) It doesn't copy the 'metrics.properties' to driver machine - since it's supposed to copy files only in executors working directory. 2) By the time it copies those files on executors, metrics system would have tried to initialize and failed - because of lack of files. – K P Sep 07 '16 at 05:34
1

SparkConf only load local system properties if they start with the prefix spark., do you have tray to load your properties adding spark?

jlopezmat
  • 930
  • 4
  • 9
  • I don't quite follow, what does this mean in context of the question? – K P Sep 15 '16 at 04:07
  • `private val masterMetricsSystem = MetricsSystem.createMetricsSystem("master", conf, securityMgr)` This conf value is a SparkConf, so if you can insert into this conf your properties, you can create your metricsSystem with your custom properties, in your example spark.executor.sink.mySink.propName=myProp and then read it in your sink. I've never tried to do anything with the metricSystem, so I don't know if you could use this, but I had change some Executor's propeties by adding --conf to the sparkSubmit so I hope this will be useful for you – jlopezmat Sep 15 '16 at 08:18
  • I saw this code too and metrics properties are specified as: executor.source.mysource.class=MyClass. Thanks for the reply, but it doesn't work. – K P Sep 16 '16 at 18:38