I was trying to run spark-submit and I get "Failed to find Spark assembly JAR. You need to build Spark before running this program." When I try to run spark-shell I get the same error. What I have to do in this situation.
-
Need more info. How package your project? Command line which launch spark-submit.. – gasparms Dec 23 '14 at 11:18
-
I package it through command: mvn package – Silver Jay Dec 23 '14 at 11:31
10 Answers
On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces.

- 711
- 1
- 5
- 7
-
6
-
1
-
1You can also set the environment variable with C:\Progra~1\Spark if your path has space. This would work and it worked for me... – Nikunj Kakadiya Nov 15 '18 at 07:34
-
-
Thank you. I actually Logged into my account to just to vote up. – Zeeshan Qureshi Apr 15 '20 at 10:55
Your Spark package doesn't include compiled Spark code. That's why you got the error message from these scripts spark-submit
and spark-shell
.
You have to download one of pre-built version in "Choose a package type" section from the Spark download page.

- 72,696
- 27
- 242
- 420

- 5,114
- 2
- 20
- 21
-
8I get the same error, and have downloaded a prebuilt version of spark.. Running windows – Marin Apr 27 '15 at 09:07
-
2@Marin If you are asking question for your problem, would you please create new question, and describe your environment (Spark version, OS version, java version, and etc.) – suztomo Apr 27 '15 at 09:12
-
Hi, thanks for the solution. Do you know why "compiled Spark code" is needed for (& exactly what it is)? I had installed my Spark using Python's package manager with `python -m pip install pyspark`. Just curious if someone could share the actual reason why Spark can't work without this, so that I can dig deeper into my case's investigation. Thanks! – akki May 23 '22 at 19:14
-
@akki In Java, you need to compile source code into class files to run it. – suztomo May 23 '22 at 23:23
Try running mvn -DskipTests clean package
first to build Spark.

- 72,696
- 27
- 242
- 420

- 1,217
- 1
- 15
- 28
If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.

- 89
- 2
- 4
In my case, I install spark by pip3 install pyspark
on macOS system, and the error caused by incorrect SPARK_HOME
variable. It works when I run command like below:
PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark python3 wordcount.py a.txt

- 4,444
- 5
- 44
- 57
Go to
SPARK_HOME
. Note that your SPARK_HOME variable should not include/bin
at the end. Mention it when you're when you're adding it to path like this:export PATH=$SPARK_HOME/bin:$PATH
Run
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
to allot more memory to maven.Run
./build/mvn -DskipTests clean package
and be patient. It took my system 1 hour and 17 minutes to finish this.Run
./dev/make-distribution.sh --name custom-spark --pip
. This is just for python/pyspark. You can add more flags for Hive, Kubernetes, etc.
Running pyspark
or spark-shell
will now start pyspark and spark respectively.

- 109
- 1
- 1
- 4
If you have downloaded binary and getting this exception
Then please check your Spark_home path may contain spaces like "apache spark"/bin
Just remove spaces will works.

- 10,864
- 5
- 72
- 96
Just to add to @jurban1997 answer.
If you are running windows then make sure that SPARK_HOME and SCALA_HOME environment variables are setup right. SPARK_HOME should be pointing to {SPARK_HOME}\bin\spark-shell.cmd

- 1,834
- 5
- 24
- 56
For Windows machine with the pre-build version as of today (21.01.2022): In order to verify all the edge cases you may have and avoid tedious guesswork about what exactly is not configred properly:
- Find spark-class2.cmd and open it in with a text editor
- Inspect the arguments of commands staring with call or if exists by typing the arguments in Command Prompt like this:
- Open Command Prompt. (For PowerShell you need to print the var another way)
- Copy-paste %SPARK_HOME%\bin\ as is and press enter.
- If you see something like bin\bin in the path displayed now then you have appended /bin in your environment variable %SPARK_HOME%.
- Now you have to add the path to the spark/bin to your PATH variable or it will not find spark-submit command
- Try out and correct every path variable that the script in this file uses and and you should be good to go.
- After that enter spark-submit ... you may now encounter the missing hadoop winutils.exe for which problem you can go get the tool and paste it where the spark-submit.cmd is located

- 6,026
- 1
- 48
- 70
Spark Installation:
For Window machine:
Download spark-2.1.1-bin-hadoop2.7.tgz from this site https://spark.apache.org/downloads.html Unzip and Paste your spark folder in C:\ drive and set environment variable. If you don’t have Hadoop, you need to create Hadoop folder and also create Bin folder in it and then copy and paste winutils.exe file in it. download winutils file from [https://codeload.github.com/gvreddy1210/64bit/zip/master][1] and paste winutils.exe file in Hadoop\bin folder and set environment variable for c:\hadoop\bin; create temp\hive folder in C:\ drive and give the full permission to this folder like: C:\Windows\system32>C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive open command prompt first run C:\hadoop\bin> winutils.exe and then navigate to C:\spark\bin> run spark-shell

- 1
- 1

- 31
- 3