16

I am trying to import and use pyspark with anaconda.

After installing spark, and setting the $SPARK_HOME variable I tried:

$ pip install pyspark

This won't work (of course) because I discovered that I need to tel python to look for pyspark under $SPARK_HOME/python/. The problem is that to do that, I need to set the $PYTHONPATH while anaconda don't use that environment variable.

I tried to copy the content of $SPARK_HOME/python/ to ANACONDA_HOME/lib/python2.7/site-packages/ but it won't work.

Is there any solution to use pyspark in anaconda?

raulk
  • 2,809
  • 15
  • 32
farhawa
  • 10,120
  • 16
  • 49
  • 91

7 Answers7

20

This may have only become possible recently, but I used the following and it worked perfectly. After this, I am able to 'import pyspark as ps' and use it with no problems.

conda install -c conda-forge pyspark

Gibolt
  • 42,564
  • 15
  • 187
  • 127
mewa6
  • 308
  • 3
  • 8
  • 2
    It is amazing how in the past two years conda-forge has made things take 30 seconds that used to take 3 days. – eric Dec 16 '19 at 05:12
9

You can simply set PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON environmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

or

export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython 
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python 

When you use $SPARK_HOME/bin/pyspark / $SPARK_HOME/bin/spark-submit it will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.

On a side note using PYTHONPATH should work just fine, even if it is not recommended.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
zero323
  • 322,348
  • 103
  • 959
  • 935
  • Thanks for the answer. Could I import `pyspark` in a standalone mode? I mean `import pyspark` – farhawa Nov 19 '15 at 22:13
  • 2
    It is not a very precise description... What exactly doesn't work. What exactly doesn't work. And just for the record - using `PYTHONPATH` should work just fine. It is just not recommended. – zero323 Nov 19 '15 at 22:46
2

Here are the complete set of environment variables I had to put in my .bashrc to get this to work in both scripts and notebook

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PYLIB=/opt/spark-2.1.0-bin-hadoop2.7/python/lib

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
Tom Whittaker
  • 401
  • 4
  • 4
2

Perhaps this can help someone, According to the Anaconda documentation, you install FindSpark as follows:

conda install -c conda-forge findspark 

It was only after installing it as showned about that I was able to import FindSpark. No export statements required.

Tshilidzi Mudau
  • 7,373
  • 6
  • 36
  • 49
1

I don't believe that you need nor can install pyspark as a module. Instead, I extended my $PYTHONPATH in my ~/.bash_profile as follows:

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

After that, I was able to import pyspark as ps. Hope that works for you too.

PC3SQ
  • 125
  • 8
  • What exists in `$SPARK_HOME/python/build`? I can't find the `build` directory included in my spark distribution (spark-2.1.0-bin-hadoop2.4). – Tarrasch Jan 04 '17 at 09:00
0

Hey there you could try these running these lines in the Anaconda powershell instead. Straight from https://anaconda.org/conda-forge/findspark

To install this package with conda run one of the following:
conda install -c conda-forge findspark
conda install -c conda-forge/label/gcc7 findspark
conda install -c conda-forge/label/cf201901 findspark
Jerrold110
  • 191
  • 1
  • 4
0

Try to use that command which will help you to install lower version of pyspark

pip install pyspark==3.x.x

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 04 '22 at 15:29