10

Below is my project's structure:

spark-application:

scala1.scala // I am calling the java class from this class.

java.java // this will submit another spark application to the yarn cluster.

The spark-application that is being triggered by java class:

scala2.scala

My reference tutorial is here

When I run my java class from scala1.scala via spark-submit in the local mode the second spark application scala2.scala is getting triggered and working as expected.

But, when I run the same application via spark-submit in yarn cluster it is showing the below error!

Error: Could not find or load main class 
org.apache.spark.deploy.yarn.ApplicationMaster

 Application application_1493671618562_0072 failed 5 times due to AM Container for appattempt_1493671618562_0072_000005 exited with exitCode: 1
 For more detailed output, check the application tracking page: http://headnode.internal.cloudapp.net:8088/cluster/app/application_1493671618562_0072 Then click on links to logs of each attempt.
 Diagnostics: Exception from container-launch.
 Container id: container_e02_1493671618562_0072_05_000001
 Exit code: 1
 Exception message: /mnt/resource/hadoop/yarn/local/usercache/helixuser/appcache/application_1493671618562_0072/container_e02_1493671618562_0072_05_000001/launch_container.sh: line 26: $PWD:$PWD/spark_conf:$PWD/spark.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/:/usr/hdp/current/hadoop-client/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
 Stack trace: ExitCodeException exitCode=1: /mnt/resource/hadoop/yarn/local/usercache/helixuser/appcache/application_1493671618562_0072/container_e02_1493671618562_0072_05_000001/launch_container.sh: line 26: $PWD:$PWD/spark_conf:$PWD/spark.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/:/usr/hdp/current/hadoop-client/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
 at org.apache.hadoop.util.Shell.run(Shell.java:844)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
 at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Container exited with a non-zero exit code 1
 Failing this attempt. Failing the application.

My project's directory structure is given below:

lrwxrwxrwx 1 yarn hadoop   95 May  5 06:03 __app__.jar -> /mnt/resource/hadoop/yarn/local/filecache/10/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar
-rw-r--r-- 1 yarn hadoop   74 May  5 06:03 container_tokens
-rwx------ 1 yarn hadoop  710 May  5 06:03 default_container_executor_session.sh
-rwx------ 1 yarn hadoop  764 May  5 06:03 default_container_executor.sh
-rwx------ 1 yarn hadoop 6433 May  5 06:03 launch_container.sh
lrwxrwxrwx 1 yarn hadoop  102 May  5 06:03 __spark_conf__ -> /mnt/resource/hadoop/yarn/local/usercache/helixuser/filecache/80/__spark_conf__6125877397366945561.zip
lrwxrwxrwx 1 yarn hadoop  125 May  5 06:03 __spark__.jar -> /mnt/resource/hadoop/yarn/local/usercache/helixuser/filecache/81/spark-assembly-1.6.3.2.5.4.0-121-hadoop2.7.3.2.5.4.0-121.jar
drwx--x--- 2 yarn hadoop 4096 May  5 06:03 tmp
find -L . -maxdepth 5 -ls:
3933556      4 drwx--x---   3 yarn     hadoop       4096 May  5 06:03 .
3933558      4 drwx--x---   2 yarn     hadoop       4096 May  5 06:03 ./tmp
3933562      4 -rw-r--r--   1 yarn     hadoop         60 May  5 06:03 ./.launch_container.sh.crc
3933517 185944 -r-x------   1 yarn     hadoop   190402950 May  5 06:03 ./__spark__.jar
3933564      4 -rw-r--r--   1 yarn     hadoop          16 May  5 06:03 ./.default_container_executor_session.sh.crc
3933518      4 drwx------   2 yarn     hadoop        4096 May  5 06:03 ./__spark_conf__
3933548      4 -r-x------   1 yarn     hadoop         945 May  5 06:03 ./__spark_conf__/taskcontroller.cfg
3933543      4 -r-x------   1 yarn     hadoop         249 May  5 06:03 ./__spark_conf__/slaves
3933541      4 -r-x------   1 yarn     hadoop        2316 May  5 06:03 ./__spark_conf__/ssl-client.xml.example
3933520      4 -r-x------   1 yarn     hadoop        1734 May  5 06:03 ./__spark_conf__/log4j.properties
3933526      4 -r-x------   1 yarn     hadoop         265 May  5 06:03 ./__spark_conf__/hadoop-metrics2-azure-file-system.properties
3933536      4 -r-x------   1 yarn     hadoop        1045 May  5 06:03 ./__spark_conf__/container-executor.cfg
3933519      8 -r-x------   1 yarn     hadoop        5685 May  5 06:03 ./__spark_conf__/hadoop-env.sh
3933531      4 -r-x------   1 yarn     hadoop        2358 May  5 06:03 ./__spark_conf__/topology_script.py
3933547      8 -r-x------   1 yarn     hadoop        4113 May  5 06:03 ./__spark_conf__/mapred-queues.xml.template
3933528      4 -r-x------   1 yarn     hadoop         744 May  5 06:03 ./__spark_conf__/ssl-client.xml
3933544      4 -r-x------   1 yarn     hadoop         417 May  5 06:03 ./__spark_conf__/topology_mappings.data
3933549      4 -r-x------   1 yarn     hadoop         342 May  5 06:03 ./__spark_conf__/__spark_conf__.properties
3933523      4 -r-x------   1 yarn     hadoop         247 May  5 06:03 ./__spark_conf__/hadoop-metrics2-adl-file-system.properties
3933535      4 -r-x------   1 yarn     hadoop        1020 May  5 06:03 ./__spark_conf__/commons-logging.properties
3933525     24 -r-x------   1 yarn     hadoop       22138 May  5 06:03 ./__spark_conf__/yarn-site.xml
3933529      4 -r-x------   1 yarn     hadoop        2450 May  5 06:03 ./__spark_conf__/capacity-scheduler.xml
3933538      4 -r-x------   1 yarn     hadoop        2490 May  5 06:03 ./__spark_conf__/hadoop-metrics.properties
 3933534     12 -r-x------   1 yarn     hadoop        8754 May  5 06:03 ./__spark_conf__/hdfs-site.xml
 3933533      8 -r-x------   1 yarn     hadoop        4261 May  5 06:03 ./__spark_conf__/yarn-env.sh
 3933532      4 -r-x------   1 yarn     hadoop        1335 May  5 06:03 ./__spark_conf__/configuration.xsl
 3933530      4 -r-x------   1 yarn     hadoop         758 May  5 06:03 ./__spark_conf__/mapred-site.xml.template
 3933545      4 -r-x------   1 yarn     hadoop        1000 May  5 06:03 ./__spark_conf__/ssl-server.xml
 3933527      8 -r-x------   1 yarn     hadoop        4680 May  5 06:03 ./__spark_conf__/core-site.xml
 3933522      8 -r-x------   1 yarn     hadoop        5783 May  5 06:03 ./__spark_conf__/hadoop-metrics2.properties
 3933542      4 -r-x------   1 yarn     hadoop        1308 May  5 06:03 ./__spark_conf__/hadoop-policy.xml
 3933540      4 -r-x------   1 yarn     hadoop        1602 May  5 06:03 ./__spark_conf__/health_check
 3933537      8 -r-x------   1 yarn     hadoop        4221 May  5 06:03 ./__spark_conf__/task-log4j.properties
 3933521      8 -r-x------   1 yarn     hadoop        7596 May  5 06:03 ./__spark_conf__/mapred-site.xml
 3933546      4 -r-x------   1 yarn     hadoop        2697 May  5 06:03 ./__spark_conf__/ssl-server.xml.example
 3933539      4 -r-x------   1 yarn     hadoop         752 May  5 06:03 ./__spark_conf__/mapred-env.sh
 3932820 135852 -r-xr-xr-x   1 yarn     hadoop   139105807 May  4 22:53 ./__app__.jar
 3933566      4 -rw-r--r--   1 yarn     hadoop          16 May  5 06:03 ./.default_container_executor.sh.crc
 3933563      4 -rwx------   1 yarn     hadoop         710 May  5 06:03 ./default_container_executor_session.sh
 3933559      4 -rw-r--r--   1 yarn     hadoop          74 May  5 06:03 ./container_tokens
3933565      4 -rwx------   1 yarn     hadoop         764 May  5 06:03 ./default_container_executor.sh
3933560      4 -rw-r--r--   1 yarn     hadoop          12 May  5 06:03 ./.container_tokens.crc
3933561      8 -rwx------   1 yarn     hadoop        6433 May  5 06:03 ./launch_container.sh
broken symlinks(find -L . -maxdepth 5 -type l -ls):

Below is the java code that invokes the second Spark application:

import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkException;


public class CallingSparkJob {

        public void submitJob(String latestreceivedpitrL,String newPtr) throws Exception {
           System.out.println("In submit job method");
            try{
            System.out.println("Building a spark command");

   // prepare arguments to be passed to 
   // org.apache.spark.deploy.yarn.Client object
   String[] args = new String[] {
       // the name of your application
"--name",
"name",
// "--master",
    // "yarn",
      //    "--deploy-mode",
    //  "cluster",           


          //       "--conf", "spark.yarn.executor.memoryOverhead=600", "--conf", 
        "spark.yarn.submit.waitAppCompletion=false",
       
       // memory for driver (optional)
       "--driver-memory",
       "1000M",

       "--num-executors",
       "2",
       "--executor-cores",
       "2",
          
       // path to your application's JAR file 
       // required in yarn-cluster mode      
       "--jar",
   "wasb://storage_account_container@storageaccount.blob.core.windows.net/user/ankushuser/sparkfiller/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar",
       // name of your application's main class (required)
       "--class",
       "com.test.SparkFiller",

       // comma separated list of local jars that want 
       // SparkContext.addJar to work with      
      // "--addJars",
      // "/Users/mparsian/zmp/github/data-algorithms-book/lib/spark-assembly-1.5.2-hadoop2.6.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/log4j-1.2.17.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/junit-4.12-beta-2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jsch-0.1.42.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/JeraAntTasks.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jedis-2.5.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jblas-1.2.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/hamcrest-all-1.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/guava-18.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math3-3.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math-2.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-logging-1.1.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang3-3.4.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang-2.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-io-2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-httpclient-3.0.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-daemon-1.0.5.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-configuration-1.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-collections-3.2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-cli-1.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/cloud9-1.3.2.jar",


       // argument 1 for latestreceivedpitrL
    "--arg",
       latestreceivedpitrL,

       // argument 2 for newPtr
     "--arg",
       newPtr,

"--arg",
"yarn-cluster"

       
   };
   
  System.out.println("create a Hadoop Configuration object");

// create a Hadoop Configuration object
   Configuration config = new Configuration();
   
   // identify that you will be using Spark as YARN mode
  System.setProperty("SPARK_YARN_MODE", "true");

   // create an instance of SparkConf object
   SparkConf sparkConf = new SparkConf();
sparkConf.setSparkHome("/usr/hdp/current/spark-client");
    // sparkConf.setMaster("yarn");
    sparkConf.setMaster("yarn-cluster");
    
   // sparkConf.setAppName("spark-yarn");
   //  sparkConf.set("master", "yarn");
    
    // sparkConf.set("spark.submit.deployMode", "cluster"); // worked

   // create ClientArguments, which will be passed to Client
   // ClientArguments cArgs = new ClientArguments(args);
   ClientArguments cArgs = new ClientArguments(args, sparkConf);
   
   // create an instance of yarn Client client
   Client client = new Client(cArgs, config, sparkConf); 
            
   // submit Spark job to YARN
   client.run(); 
   }catch(Exception e){
    
       System.out.println("Error submitting spark Job");
    
       System.out.println(e.getMessage());
   }
   
  }
   
 }

The spark-submit command used to run the first spark application locally:

spark-submit --class scala1 --master yarn --deploy-mode cluster --num-executors 2 --executor-cores 2 --conf spark.yarn.executor.memoryOverhead=600 --conf spark.yarn.submit.waitAppCompletion=false /home/ankushuser/kafka_retry/kafka_retry_test/sparkflightaware/target/sparkflightaware-0.0.1-SNAPSHOT-jar-with-dependencies.jar

If I run this spark-submit command locally it is invoking the java class and the spark-submit command for the second scala2 application is working fine.

If I run it in yarn mode, then I am facing the issue.

Thank you for your help.

Community
  • 1
  • 1
ankush reddy
  • 481
  • 1
  • 5
  • 28
  • 1
    Seems like something to do with the way you are running your application. Please post the command used to run your application. Also, it would be helpful if you could post the code where you are getting this error. – code May 05 '17 at 07:07
  • Spark is not able to find the class `com.test.SparkFiller` in wasb://storage_account_container@storageaccount.blob.core.windows.net/user/ankushuser/sparkfiller/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar. Could you verify if that class is present in the jar under `com.test.` package? – code May 05 '17 at 07:44
  • @code yes I have that class in the jar. if am using local spark-submit --class scala1 --master local --deploy-mode cluster --num-executors 2 --executor-cores 2 --conf spark.yarn.executor.memoryOverhead=600 --conf spark.yarn.submit.waitAppCompletion=false /home/ankushuser/kafka_retry/kafka_retry_test/sparkflightaware/target/sparkflightaware-0.0.1-SNAPSHOT-jar-with-dependencies.jar the spark-submit which invokes scala2 class which is com.test.sparkfiller is running. – ankush reddy May 05 '17 at 07:53
  • Ok, could you verify if this `wasb://storage_account_container@storageaccount.blob.core.windows.net/user/ankushuser/sparkfiller/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar` jar is accessible to all the nodes in the cluster? – code May 05 '17 at 08:01
  • I have the cluster setup on that storage account which technically should have access to all the nodes of the cluster. But is there any specific way to check if that file is accessible by all the nodes? – ankush reddy May 05 '17 at 08:08
  • I could have been looking at wrong place. Can you confirm if the spark home directory `/usr/hdp/current/spark-client` is present in the node where you are starting the spark app? There is a high chance that the spark is not installed in the node where your `CallingSparkJob` is running. – code May 05 '17 at 08:33
  • spark is installed in all the nodes. I logged into all the worker nodes and tried to submit the spark-submit scala2 separately which is worked. – ankush reddy May 05 '17 at 08:45
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/143477/discussion-between-code-and-ankush-reddy-sugureddy). – code May 05 '17 at 08:49
  • Have you looked at http://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster ? – Rick Moritz May 11 '17 at 13:14
  • 2
    You don't need to call Yarn, when you're using spark-submit, Your program is already invoking a submit, wrapping another submit around it, won't help. Either just run the program, or spark-submit your actual Spark-code. – Rick Moritz May 12 '17 at 09:17
  • I am guessing that your yarn configuration files are corrupt, hard to tell which one. Can you update your configuration especially the classpath part of yarn-site.xml? – Ramesh Maharjan May 13 '17 at 06:16
  • Possible duplicate of ["Bad substitution" when submitting spark job to yarn-cluster](http://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster) – Rick Moritz May 16 '17 at 08:16

1 Answers1

0

Since there's a bounty, I'll repost this as a reply as well -- but in reality I would like to flag this as a duplicate, since the actual Exception is the one covered in another question, and answered:

It is caused by hdp.version not getting substituted correctly. You have to set hdp.version in the file java-opts under $SPARK_HOME/conf.

Alternatively, use

--driver-java-options="-Dhdp.version=INSERT_VERSION_STRING_HERE" --conf "spark.executor.extraJavaOptions=-Dhdp.version=INSERT_VERSION_STRING_HERE" in your spark-submit and make sure to use the correct version string, as in the subdirectory of /usr/hdp.

If you want to stick with calling client.submit from your code, then you need to put those lines into the --arg you build in your code.

Community
  • 1
  • 1
Rick Moritz
  • 1,449
  • 12
  • 25
  • 1
    I think it is already mentioned in the discussion in the comments above. apparently it didn't solve the problem – user1314742 May 11 '17 at 15:12
  • 1
    Ah, I see, that was hidden "in chat" - unfortunate. Anyway, that's still the error. I'll amend my answer to offer an alternative. – Rick Moritz May 12 '17 at 07:57