8

I'm on Windows 10. I was trying to get Spark up and running in a Jupyter Notebook alongside Python 3.5. I installed a pre-built version of Spark and set the SPARK_HOME environmental variable. I installed findspark and run the code:

import findspark
findspark.init()

I receive a Value error:

ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation).

However the SPARK_HOME variable is set. Here is a screenshot that shows that the list of environmental variables on my system.

Has anyone encountered this issue or would know how to fix this? I only found an old discussion in which someone had set SPARK_HOME to the wrong folder but I don't think it's my case.

Andrea
  • 83
  • 1
  • 1
  • 6
  • I don't see the list of your environment variables in the screenshot you posted. Can you edit the image to highlight them so it's easier to find them. – dmlicht Jul 18 '16 at 09:02
  • 1
    Dear dmlicht, you are very right, it was not there! May have been an issue of taking a screenshot in Windows... [Here](https://www.dropbox.com/s/1oxae74d9bsoz6x/env_var.png?dl=0) is a link to an image with the env variables. Thank you for spotting this! (I have also tried to do this with SPARK_HOME as a user variable rather than system variable but it didn't work) – Andrea Jul 18 '16 at 10:26

10 Answers10

23

I had the same problem and wasted a lot of time. I found two solutions:

There are two solutions

  1. copy downloaded spark folder in somewhere in C directory and give the link as below

    import findspark
    findspark.init('C:/spark')
    
  2. use the function of findspark to find automatically the spark folder

    import findspark
    findspark.find()
    
emdi
  • 469
  • 5
  • 7
  • 1
    How can I find the Spark folder? – Hamideh Oct 02 '20 at 05:16
  • @Hamideh it depends on how you've installed it. You can install it from the spark home page, or use a package manager to install it (i.e. homebrew with macOS). Wherever it is installed, just use the filepath as an environment variable in your system or use the findspark solution – suntzu Mar 27 '22 at 19:59
5

The environmental variables get updated only after system reboot. It works after restarting your system.

Manisha Galla
  • 51
  • 1
  • 1
2

I had same problem and had it solved by installing "vagrant" and "virtual box". (Note, though I use Mac OS and Python 2.7.11)

Take a look at this tutorial, which is for the Harvard CS109 course : https://github.com/cs109/2015lab8/blob/master/installing_vagrant.pdf

After "vagrant reload" on the terminal , I am able to run my codes without errors. NOTE the difference between the result of command "os.getcwd" shown in the attached images.

enter image description here

Ancalagon BerenLuthien
  • 1,154
  • 6
  • 20
  • 29
  • Thank you - I'd be interested in knowing how to do this purely on Windows (i.e. without a virtual box) but will accept your response as the answer in a couple of days if nobody else responds. – Andrea Jul 18 '16 at 08:26
0

I had the same problem when installing spark using pip install pyspark findspark in a conda environment.

The solution was to do this:

export /Users/pete/miniconda3/envs/cenv3/lib/python3.6/site-packages/pyspark/
jupyter notebook

You'll have to substitute the name of your conda environment for cenv3 in the command above.

juniper-
  • 6,262
  • 10
  • 37
  • 65
0

Restarting the system after setting up the environmental variables worked for me.

Natty
  • 527
  • 5
  • 10
0

i have same problem, i solved it by closing cmd then open again. i forget that after editing env variable on windows that should restart cmd..

hartono -
  • 41
  • 6
0

I got the same error. Initially, I had stored my Spark folder in the Documents directory. Later, when I moved it to the Desktop, it suddenly started recognizing all the system variables and it ran findspark.init() without any error.

Try it out once.

0

This error may occur, if you don't set the environment variables in .bashrc file. Set your python environment variable as follows:

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.8.1-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH

 
Vladimir Samsonov
  • 1,344
  • 2
  • 11
  • 18
0

the simplest way i found to use spark with jupyter notebook is

1- download spark

2- unzip to desired location

3- open jupyter notebook in usual way nothing special

4- now run the below code

import findspark
findspark.init("location of spark folder ")

# in my case it is like 

import findspark
findspark.init("C:\\Users\\raj24\\OneDrive\\Desktop\\spark-3.0.1-bin-hadoop2.7")
Raj
  • 173
  • 5
0

In case anyone is using a newer Spark version (3.4.1): Make sure to include the "libexec" folder in your init() statement:

findspark.init("/opt/homebrew/Cellar/apache-spark/3.4.1/libexec/")