3

I am a spark newbie and I want to run a Python script from the command line. I have tested pyspark interactively and it works. I get this error when trying to create the sc:

File "test.py", line 10, in <module>
    conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway
    SPARK_HOME = os.environ["SPARK_HOME"]
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'SPARK_HOME'
WoodChopper
  • 4,265
  • 6
  • 31
  • 55
Dirk N
  • 717
  • 3
  • 9
  • 23

1 Answers1

10

It seems like there are two problems here.

The first one is a path you use. SPARK_HOME should point to the root directory of the Spark installation so in your case it should probably be /home/dirk/spark-1.4.1-bin-hadoop2.6 not /home/dirk/spark-1.4.1-bin-hadoop2.6/bin.

The second problem is a way how you use setSparkHome. If you check a docstring its goal is to

set path where Spark is installed on worker nodes

SparkConf constructor assumes that SPARK_HOME on master is already set. It calls pyspark.context.SparkContext._ensure_initialized which calls pyspark.java_gateway.launch_gateway, which tries to acccess SPARK_HOME and fails.

To deal with this you should set SPARK_HOME before you create SparkConf.

import os
os.environ["SPARK_HOME"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6"
conf = (SparkConf().setMaster('local').setAppName('a'))
zero323
  • 322,348
  • 103
  • 959
  • 935
  • 1
    What if I am trying to connect to a remote machine? And setting "SPARK_HOME" when trying to run a client(in this case *pyspark*) doesn't really make sense, does it? Shouldn't this be removed? – Tanny Jan 10 '17 at 12:18