5

I am trying to deploy spark job by using spark-submit which has bunch of parameters like

spark-submit --class Eventhub --master yarn --deploy-mode cluster --executor-memory 1024m --executor-cores 4 --files app.conf spark-hdfs-assembly-1.0.jar --conf "app.conf"

I was looking a way to put all these flags in file to pass to spark-submit to make my spark-submit command simple liek this

spark-submit --class Eventhub --master yarn --deploy-mode cluster --config-file my-app.cfg --files app.conf spark-hdfs-assembly-1.0.jar --conf "app.conf"

anyone know if this is possible ?

roy
  • 6,344
  • 24
  • 92
  • 174

2 Answers2

8

You can use --properties-file which should include parameters with starting keyword spark like

spark.driver.memory 5g
spark.executor.memory 10g

And command should look like:

spark-submit --class Eventhub --master yarn --deploy-mode cluster --properties-file <path-to-your-conf-file> --files app.conf spark-hdfs-assembly-1.0.jar --conf "app.conf"
FaigB
  • 2,271
  • 1
  • 13
  • 22
  • I tried `spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode cluster --properties-file properties.conf --files streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"` but got this error https://gist.github.com/anonymous/a18bc346841fe2012510ca30d0b3539c – roy Mar 23 '17 at 16:40
  • If I run this `spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode cluster --executor-memory 3072m --executor-cores 4 --files streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"` works fine – roy Mar 23 '17 at 16:44
  • Error occurred due to incorrect hdfs config. The path for tar file which should be loaded invalid – FaigB Mar 23 '17 at 19:52
  • Is there way to correct it when using `--properties-file properties.conf` – roy Mar 23 '17 at 20:09
  • What is content of yours properties.conf? – FaigB Mar 23 '17 at 20:46
  • This what I have in properties.conf https://paste.fedoraproject.org/paste/PvirSfGpdPQGcOoK6smYeV5M1UNdIGYhyRLivL9gydE= – roy Mar 24 '17 at 00:06
  • Check please core-site.xml for hdfs property. It is obvious that it couldn't resolve hdfs configs. – FaigB Mar 24 '17 at 08:40
  • But same command works with `--executor-memory 3072m --executor-cores 4` instead of `--properties-file properties.conf`. So I don't think hdfs config has an issue. – roy Mar 24 '17 at 11:24
0

Besides setting --properties as @FaigB mentioned, another way is to use conf/spark-defaults.conf. You can find where it resides by doing find-spark-home or locating and looking into spark-env.sh. Alternatively, you can define where this config is parked by setting the environment variable when or before you call spark-submit, e.g., SPARK_CONF_DIR=/your_dir/ spark-submit .... If you are working with YARN, setting SPARK_CONF_DIR will not work. You can find out more here https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties

NucFlash
  • 91
  • 1
  • 6