2

I'm trying to execute spark-submit in a standalone mode.I've my project compiled success in IntelliJIdea tool,also I have created the associated jar file but when I try to run the following:

[cloudera@quickstart bin]$ spark-submit --verbose --class graphx /home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar /usr/lib/spark/logs/temp.log

I'm getting the following output and error message:

Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property: spark.yarn.historyServer.address=http://quickstart.cloudera:18088
Adding default property: spark.dynamicAllocation.schedulerBacklogTimeout=1
Adding default property: spark.yarn.am.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property: spark.shuffle.service.port=7337
Adding default property: spark.master=yarn-client
Adding default property: spark.authenticate=false
Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
Adding default property: spark.eventLog.dir=hdfs://quickstart.cloudera:8020/user/spark/applicationHistory
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.dynamicAllocation.minExecutors=0
Adding default property: spark.dynamicAllocation.executorIdleTimeout=60
Adding default property: spark.yarn.jar=local:/usr/lib/spark/lib/spark-assembly.jar
Parsed arguments:
  master                  yarn-client
  deployMode              null
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          /usr/lib/spark/conf/spark-defaults.conf
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  /usr/lib/hadoop/lib/native
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               graphx
  primaryResource         file:/home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar
  name                    graphx
  childArgs               [/usr/lib/spark/logs/temp.log]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf:
  spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.yarn.jar -> local:/usr/lib/spark/lib/spark-assembly.jar
  spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.authenticate -> false
  spark.yarn.historyServer.address -> http://quickstart.cloudera:18088
  spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native
  spark.eventLog.enabled -> true
  spark.dynamicAllocation.schedulerBacklogTimeout -> 1
  spark.serializer -> org.apache.spark.serializer.KryoSerializer
  spark.dynamicAllocation.executorIdleTimeout -> 60
  spark.dynamicAllocation.minExecutors -> 0
  spark.shuffle.service.enabled -> true
  spark.shuffle.service.port -> 7337
  spark.eventLog.dir -> hdfs://quickstart.cloudera:8020/user/spark/applicationHistory
  spark.master -> yarn-client
  spark.dynamicAllocation.enabled -> true


Main class:
graphx
Arguments:
/usr/lib/spark/logs/temp.log
System properties:
spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.yarn.jar -> local:/usr/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.authenticate -> false
spark.yarn.historyServer.address -> http://quickstart.cloudera:18088
spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native
spark.eventLog.enabled -> true
spark.dynamicAllocation.schedulerBacklogTimeout -> 1
SPARK_SUBMIT -> true
spark.serializer -> org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled -> true
spark.dynamicAllocation.minExecutors -> 0
spark.dynamicAllocation.executorIdleTimeout -> 60
spark.app.name -> graphx
spark.jars -> file:/home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar
spark.submit.deployMode -> client
spark.shuffle.service.port -> 7337
spark.eventLog.dir -> hdfs://quickstart.cloudera:8020/user/spark/applicationHistory
spark.master -> yarn-client
spark.dynamicAllocation.enabled -> true
Classpath elements:
file:/home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar


java.lang.ClassNotFoundException: graphx
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
    at    org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

My question is, Where has to be located the package?I have it under a IntelliJIdea path, Should I copy to another path in /usr/lib/spark/??

Thanks!

user1314742
  • 2,865
  • 3
  • 28
  • 34
florecitas
  • 125
  • 3
  • 10

2 Answers2

2

You have to provide a fully qualified class name to spark-submit

let's say you have the name off your package com.me.application, the spark-submit command should look something like:

Edit

As seen in the comment your class name is FormatDataTlf and not graphx with the package name tlf, spark-submit --class tlf.FormatDataTlf ....

user1314742
  • 2,865
  • 3
  • 28
  • 34
  • I don't know how to find out the fully qualified class name, I'm using IntelliJ Idea SDK – florecitas May 05 '16 at 08:46
  • go to the file `graphx.scala` the first un-commented line should be like `package xxx` so the fully qualified name will be `xxx.graphx` – user1314742 May 05 '16 at 09:05
  • I have tried, but is not working package tlf ,the begining of my scala code looks like this: import org.apache.spark.{SparkConf, SparkContext} /** * @author ${user.name} */ object FormatDataTlf {} – florecitas May 05 '16 at 10:01
  • could you put the output of the following command: `$unzip -l /home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar | grep graphx` – user1314742 May 05 '16 at 10:46
  • I ve just seen your other question http://stackoverflow.com/questions/37045829/how-to-find-out-fully-qualified-name-intellij-project , I could not see a file named graphx, but from what I see the package name could be `tlf` – user1314742 May 05 '16 at 10:52
  • @evitas Maybe there isn't a class named graphx in your jar? I see in your comment that you declare an object FormatDataTlf, maybe this is your main class? – Daniel de Paula May 05 '16 at 11:10
  • This is the result :[cloudera@quickstart grafoTelefonos]$ unzip -l /home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar | grep graphx Archive: /home/cloudera/ideaProjects/grafoTelefonos/target/graphx-1.0-SNAPSHOT.jar 0 05-02-2016 09:59 META-INF/maven/spark-grafo/graphx/ 7348 04-25-2016 15:12 META-INF/maven/spark-grafo/graphx/pom.xml 110 05-02-2016 09:59 META-INF/maven/spark-grafo/graphx/pom.properties – florecitas May 05 '16 at 12:22
  • Sorry I'm new with Spark, and trying to use spark-submit for standalone app.I have build a maven project with IntelliJ and Scala, but cannot find the way of the appropiate syntax class – florecitas May 05 '16 at 12:24
  • OK that 's it, you do not have a class named graphxm I ve edited my answer an hour ago, have you tried what I ve suggested? – user1314742 May 05 '16 at 13:09
  • 1
    Thanks all!Finally the correct syntax is:[cloudera@quickstart grafoTelefonos]$ /usr/lib/spark/bin/spark-submit --class tlf.FormatDataTlf ./target/graphx-1.0-SNAPSHOT.jar – florecitas May 05 '16 at 13:17
  • HI all, I'm running into the same problem. I'm sure the class is in Jar and it's not under any package...help, thank you. – Mohamed Seif Nov 22 '17 at 11:35
  • @MohamedSeif can you please provide the spark submit command? and the full class name. – user1314742 Nov 29 '17 at 09:08
  • Command : spark-submit --class sparkpackage.StreamingKafkaGeoSpark2 /home/cloudera/streamingkafkageospark_jar/streamingkafkageospark.jar It says classnotfound sparkpackage.StreamingKafkaGeoSpark2, even when i remove "--class sparkpackage.StreamingKafkaGeoSpark2 " – Mohamed Seif Nov 29 '17 at 10:02
  • I solved the problem deleting the META-INF folder from the Jar file. – Mohamed Seif Nov 29 '17 at 14:13
1

I got this working; the correct syntax is:

[cloudera@quickstart grafoTelefonos]$ /usr/lib/spark/bin/spark-submit --class tlf.FormatDataTlf ./target/graphx-1.0-SNAPSHOT.jar

(in my case I'm launching throw the Idea Project path name "grafoTelefonos"), and the class name is FormatDataTlf, that is created inside the package tlf

Matt
  • 74,352
  • 26
  • 153
  • 180
florecitas
  • 125
  • 3
  • 10