1

I'm using livy-server-0.2 to run spark job, however, I can't change the default setting for spark.executor.cores, it can't take effect while the other settings can.

It always use 1 core to start an executor.

yarn     11893 11889  6 21:08 ?        00:00:01
/opt/jdk1.7.0_80/bin/java -server -XX:OnOutOfMemoryError=kill 
%p -Xms1024m -Xmx1024m -Djava.io.tmpdir=/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1487813931557_0603/container_1487813931557_0603_01_000026/tmp 
-Dspark.driver.port=51553 
-Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1487813931557_0603/container_1487813931557_0603_01_000026 
-XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend 
--driver-url spark://CoarseGrainedScheduler@10.1.1.81:51553 --executor-id 19 
--hostname master01.yscredit.com --cores 1 --app-id application_1487813931557_0603 
--user-class-path file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1487813931557_0603/container_1487813931557_0603_01_000026/__app__.jar

Here is my spark-defaults.conf file in $SPARK_HOME/conf

spark.master=yarn
spark.submit.deployMode=cluster
spark.executor.instances=7
spark.executor.cores=6
spark.executor.memoryOverhead=1024
spark.yarn.executor.memoryOverhead=1400
spark.executor.memory=11264
spark.driver.memory=5g
spark.yarn.driver.memoryOverhead=600
spark.speculation=true
spark.yarn.executor.memoryOverhead=1400

can anybody help me? thanks!

Ron.Lin
  • 11
  • 1
  • 2
  • In livy source code, I see it will reads two configure files, livy-client.conf and spark-defaults.conf. And livy-client.conf has higher priority than spark-defaults.conf. However, it is not the root cause for setting not take effect because I also set the setting spark.executor.cores in defaults.conf. I assume there must be a configuration for spark.executor.cores somewhere. – Ron.Lin Mar 12 '17 at 15:31
  • can you please find a file called capacity-scheduler.xml in the cluster? – loneStar Nov 13 '17 at 12:13

2 Answers2

8

I strongly advise you to read the Livy source code.I think Livy has little documentation , so ,you might meet many problems which cannot be solved by google.Livy is just a middleware, amount of code is relatively small.

You can specify the spark parameter from 3 locations:

  • Location A: If you have already set the spark parameter in your creating-session post request to LivyServer ,then any configuration occured at your post request can not be overrided by any configuration file.That is to say, configuration at your post request has the highest priority;

  • Location B: Then , at $LIVY_HOME/conf,you can set spark parameters like 'spark.driver.memory' at spark-defaults.conf or livy-client.conf;

  • Location C: At last , Livy will also use configuration at $SPARK_HOME/conf/spark-defaults.conf,but configuration here has the lowest priority,that is to say , only configuration that did't occur at Location A/B will take effect.

wuchang
  • 3,003
  • 8
  • 42
  • 66
  • yes I ended up groking through the sources as well - my trouble was HiveMetastore not activating in EMR, there're several funny places like [this](https://github.com/apache/incubator-livy/blob/551cc53095f0a4b5382602ba0c296f8cf8932e44/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L347). So it seems that some of those livy.conf thingies in **B** just deliberately break **other** values set in **C**. – Anton Kraievyi Jun 19 '18 at 10:47
1

There is a property in the yarn which limits the resource on the cluster.

sudo  vi /etc/hadoop/conf/capacity-scheduler.xml

Change the property to the following

"yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalcul‌​ator"

In order to make this property applicable you have to restart the yarn

 sudo  hadoop-yarn-resourcemanager stop

Restart the yarn

sudo  hadoop-yarn-resourcemanager start 
loneStar
  • 3,780
  • 23
  • 40