java.io.IOException: Cannot run program "python" using Spark in Pycharm (Windows)

Question

I am trying to write a very simple code using Spark in Pycharm and my os is Windows 8. I have been dealing with several problems which somehow managed to fix except for one. When I run the code using pyspark.cmd everything works smoothly but I have had no luck with the same code in pycharm. There was a problem with SPARK_HOME variable which I fixed using the following code:

import sys
import os
os.environ['SPARK_HOME'] = "C:/Spark/spark-1.4.1-bin-hadoop2.6"
sys.path.append("C:/Spark/spark-1.4.1-bin-hadoop2.6/python")
sys.path.append('C:/Spark/spark-1.4.1-bin-hadoop2.6/python/pyspark')

So now when I import the pyspark and everything is fine:

from pyspark import SparkContext

The problem rises when I want to run the rest of my code:

logFile = "C:/Spark/spark-1.4.1-bin-hadoop2.6/README.md"
sc = SparkContext()
logData = sc.textFile(logFile).cache()
logData.count()

When I receive the following error:

15/08/27 12:04:15 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified

I have added the python path as an environment variable and it's working properly using the command line but I could not figure out what my problem is with this code. Any help or comment is much appreciated.

Thanks

score 9 · Answer 1 · edited Jun 20 '20 at 09:12

9

I had the same problem as you, and then I made the following changes: set PYSPARK_PYTHON as environment variable to point to python.exe in Edit Configurations of Pycharm, here is my example:

PYSPARK_PYTHON = D:\Anaconda3\python.exe

SPARK_HOME = D:\spark-1.6.3-bin-hadoop2.6

PYTHONUNBUFFERED = 1

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 27 '18 at 18:39

KE JI

91
1
1

ahajib · Accepted Answer · 2017-02-19T23:58:02.560

4

After struggling with this for two days, I figured what the problem is. I added the followings to the "PATH" variable as windows environment variable:

C:/Spark/spark-1.4.1-bin-hadoop2.6/python/pyspark
C:\Python27

Remember, You need to change the directory to wherever your spark is installed and also the same thing for python. On the other hand, I have to mention that I am using prebuild version of spark which has Hadoop included.

Best of luck to you all.

edited Feb 19 '17 at 23:58

answered Aug 27 '15 at 16:54

ahajib

12,838
29
79
120

i tried adding the above line to my PATH variable but still no success. I am still getting the error while executing from eclipse. My Path variable is like : %{EXISTING_PATH}%;%PY_HOME%;%PY_HOME%\Scripts;%SPARK_HOME%\bin;%SPARK_HOME%\python\pyspark I tried putting spark variable first and pythong second as in %{EXISTING_PATH}%;%SPARK_HOME%\bin;%SPARK_HOME%\python\pyspark;%{EXISTING_PATH}%;%PY_HOME%;%PY_HOME%\Scripts;%SPARK_HOME%\bin;%SPARK_HOME%\python\pyspark %PY_HOME% = C:\Python2.7.11 %SPARK_HOME% = C:\Spark1.6Hadoop2.6 – Adithya Mar 13 '16 at 15:34
I have done everything from the following link: http://enahwe.blogspot.in/p/philippe-rossignol-20150612-how-to.html?showComment=1457882166345 – Adithya Mar 13 '16 at 15:38

score 4 · Answer 3 · answered Sep 13 '16 at 14:22

4

I have faced this problem, it's caused by python version conflicts on diff nodes of cluster, so, it can be solved by

export PYSPARK_PYTHON=/usr/bin/python

which are the same version on diff nodes. and then start:

pyspark

answered Sep 13 '16 at 14:22

fandyst

2,740
2
14
15

Ramesh Maharjan · Answer 4 · 2018-02-22T03:04:49.787

0

I had to set SPARK_PYTHONPATH as environment variable to point to python.exe file in addition to PYTHONPATH and SPARK_HOME variables as

SPARK_PYTHONPATH=C:\Python27\python.exe

edited Feb 22 '18 at 03:04

answered Feb 21 '18 at 17:45

Ramesh Maharjan

41,071
6
69
97

score 0 · Answer 5 · answered Apr 26 '23 at 01:41

0

Add after libraries

os.environ['PYSPARK_PYTHON'] = sys.executable os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable

and uninstall jet brains products for py

answered Apr 26 '23 at 01:41

JAGJ jdfoxito

747
5
5

java.io.IOException: Cannot run program "python" using Spark in Pycharm (Windows)

5 Answers5

Linked

Related