0

I am getting an exception, org.apache.spark.SparkException: A master URL must be set in your configuration

I used spark2-submit with options deploy-mode = cluster and master = yarn. From my understanding, I should not be getting this exception with yarn as the master.


Submit Script

export JAVA_HOME=/usr/java/jdk1.8.0_131/
spark2-submit --class com.example.myapp.ClusterEntry \
    --name "Hello World" \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 1g \
    --executor-memory 1g \
    --executor-cores 3 \
    --packages org.apache.kudu:kudu-spark2_2.11:1.4.0 \
    myapp.jar myconf.file

Exception

18/03/14 15:31:47 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 3, vm6.adcluster, executor 1): org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
    at com.example.myapp.dao.KuduSink.open(KuduSink.scala:18)
    at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:50)
    at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:49)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

The cluster is a Cloudera cluster running Spark 2.2 I noticed that the app's KuduSink was part of the exception message, perhaps the master URL error is from the KuduContext? However I was not getting such error when running this app locally for dev.

tk421
  • 5,775
  • 6
  • 23
  • 34
Alter
  • 3,332
  • 4
  • 31
  • 56

1 Answers1

0

You are correct, Spark on YARN does not require master url.

Ensure SPARK_HOME, YARN_HOME and HADOOP_HOME is configured correctly.

Hope you are two different version of spark in the same cluster. CDH parcel is shipped with spark 1.6 by default. Assuming you have installed spark2 through Custom Service Descriptor and configure the service correctly.

Ensure there is no overlapping in configuration of spark-submit(spark 1) and spark2-submit(spark 2).

Ensure client configuration is deployed for spark2 service.

Radhakrishnan Rk
  • 553
  • 3
  • 13
  • Thank-you :) everything but the client configuration is present, and tests have worked with simpler scripts. For the client config, I believe that my submit script provides the conf options I need. After more investigation, I think the issue might be related to the KuduContext that is created from the SparkContext – Alter Mar 15 '18 at 19:16