Failed to submit local jar to spark cluster: java.nio.file.NoSuchFileException

Question

~/spark/spark-2.1.1-bin-hadoop2.7/bin$ ./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar

Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/20 16:41:30 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: Submission successfully created as driver-20170620204130-0005. Polling submission state...
17/06/20 16:41:31 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20170620204130-0005 in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: State of driver driver-20170620204130-0005 is now ERROR.
17/06/20 16:41:31 INFO RestSubmissionClient: Driver is running on worker worker-20170620203037-172.17.0.5-45429 at 172.17.0.5:45429.
17/06/20 16:41:31 ERROR RestSubmissionClient: Exception from the cluster:
java.nio.file.NoSuchFileException: /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
    sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
    sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
    java.nio.file.Files.copy(Files.java:1274)
    org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:608)
    org.apache.spark.util.Utils$.copyFile(Utils.scala:579)
    org.apache.spark.util.Utils$.doFetchFile(Utils.scala:664)
    org.apache.spark.util.Utils$.fetchFile(Utils.scala:463)
    org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:154)
    org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:172)
    org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:91)
17/06/20 16:41:31 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20170620204130-0005",
  "serverSparkVersion" : "2.1.1",
  "submissionId" : "driver-20170620204130-0005",
  "success" : true
}

Log from spark-worker:

 2017-06-20T20:41:30.807403232Z 17/06/20 20:41:30 INFO Worker: Asked to launch driver driver-20170620204130-0005
2017-06-20T20:41:30.817248508Z 17/06/20 20:41:30 INFO DriverRunner: Copying user jar file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.883645747Z 17/06/20 20:41:30 INFO Utils: Copying /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.885217508Z 17/06/20 20:41:30 INFO DriverRunner: Killing driver process!
2017-06-20T20:41:30.885694618Z 17/06/20 20:41:30 WARN Worker: Driver driver-20170620204130-0005 failed with unrecoverable exception: java.nio.file.NoSuchFileException: home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar

Any idea why? Thanks

UPDATE

Is the following command right?

./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar

UPDATE

I think I understand a little more about the spark and why I had this problem and spark-submit error: ClassNotFoundException. The key point is that though the word REST used here REST URL: spark://127.0.1.1:6066 (cluster mode), the application jar will not be uploaded to the cluster after submission, which is different with my understanding. so, the spark cluster cannot find the application jar, and cannot load the main class.

I will try to find how to setup the spark cluster and use the cluster mode to submit application. No idea whether client mode will use more resources for streaming jobs.

Interesting...why do you use `32141` not `7077`? Can you also strip `file://` and just use the regular path without the prefix? — Jacek Laskowski, Jun 21 '17 at 03:05
@JacekLaskowski 1, spark running on kubernetes, so port is 32141->6066. 2, I tried the rugular path. not working. — BAE, Jun 21 '17 at 13:22
Does removing `--deploy-mode cluster` make any difference? Kubernetes is a new thing for Spark to support if I'm not mistaken so errors are _in the package_. — Jacek Laskowski, Jun 21 '17 at 13:47
Why don't you simply `spark-submit --master spark://192.168.42.80:32141 target/scala-2.11/myproj-assembly-0.1.0.jar` while in the project directory (`/home/me/workspace/myproj`)? That would make the environment less _uncommon_. — Jacek Laskowski, Jun 21 '17 at 13:49
@BAE, did you find a soluton to this? I have a similar goal (spark submit to a standalone cluster with deploy-mode cluster) and facing same issue. Albeit, with docker containers. — sujit, Feb 06 '18 at 08:58
@sujit I did not find a better solution to it. I have no idea how to submit the jar to the spark cluster. some distributed file system should be setup. store the jar in the file system. let the spark cluster to access the jar on the file system. — BAE, Feb 06 '18 at 23:18
@BAE, yeah, had to do that. In fact I found that the jar path specified need not be resolvable at the host where spark-submit is being invoked in cluster deploy mode. Seems like it is just submitted as a parameter to the driver and executors, where the jar needs to be accessible. — sujit, Feb 07 '18 at 13:50
I am facing the same issue. The solutions you provided does not work — Key Jun, Jun 13 '20 at 09:34

score 1 · Answer 1 · answered May 25 '22 at 05:57

1

You are submiting the application with cluster mode, this mean a Spark driver application will be created somewhere, the file must exist here.

That why with Spark, its recommanded to use a distributed file system like HDFS or S3.

answered May 25 '22 at 05:57

Thomas Decaux

21,738
2
113
124

score 0 · Answer 2 · answered Jan 09 '19 at 14:17

Blockquote UPDATE

I think I understand a little more about the spark and why I had this problem and >spark-submit error: ClassNotFoundException. The key point is that though the word >REST used here REST URL: spark://127.0.1.1:6066 (cluster mode), the application >jar will not be uploaded to the cluster after submission, which is different with >my understanding. so, the spark cluster cannot find the application jar, and >cannot load the main class.

That's why you have to locate the jar-file in the master node OR put it into the hdfs before the spark submit.

This is how to do it: 1.) Transfering the file to the master node with ubuntu command

$ scp <file> <username>@<IP address or hostname>:<Destination>

For example:

$ scp mytext.txt tom@128.140.133.124:~/

2.) Transfering the file to the HDFS:

$ hdfs dfs -put mytext.txt

Hope I could help you.

score -1 · Answer 3 · edited May 03 '18 at 09:59

The standalone mode cluster wants to pass jar files to hdfs because the driver is on any node in the cluster.

hdfs dfs -put xxx.jar /user/
spark-submit --master spark://xxx:7077 \
--deploy-mode cluster \
--supervise \
--driver-memory 512m \
--total-executor-cores 1 \
--executor-memory 512m \
--executor-cores 1 \
--class com.xiyou.bi.streaming.game.common.DmMoGameviewOnlineLogic \
hdfs://xxx:8020/user/hutao/xxx.jar

Failed to submit local jar to spark cluster: java.nio.file.NoSuchFileException

3 Answers3

Linked