apache-spark - Error when starting pyspark on windows

Question

I'm trying to experiment with MLlib on windows with python. So it seems I need SPARK which in turn needs HADOOP. I've installed Anaconda2 which contains python 2.7, numpy, etc.

I've been following this recipe which seems to me mostly getting me where I need to go, but I think I'm stuck on this last error:

Python 2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 19 2016, 13:29:36) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "C:\spark\bin\..\python\pyspark\shell.py", line 43, in <module>
    spark = SparkSession.builder\
  File "C:\spark\python\pyspark\sql\session.py", line 179, in getOrCreate
    session._jsparkSession.sessionState().conf().setConfString(key, value)
  File "C:\spark\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
  File "C:\spark\python\pyspark\sql\utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

From this output it is clear to see that there is not error regarding winutils.exe not being found.

Also, the exception is originating in the java domain of py4j, but we've lost the back-trace thanks to the IllegalArgumentException.

All guidance appreciated!

Cheers

This might help you http://stackoverflow.com/questions/42018550/pyspark-in-spite-of-adding-winutils-to-hadoop-home-getting-error-could-not-lo — BruceWayne, Mar 27 '17 at 01:44
Possible duplicate of [pyspark: In spite of adding winutils to HADOOP\_HOME, getting error: Could not locate executable null\bin\winutils.exe in the Hadoop binaries](http://stackoverflow.com/questions/42018550/pyspark-in-spite-of-adding-winutils-to-hadoop-home-getting-error-could-not-lo) — Jacek Laskowski, Mar 27 '17 at 11:44
@BruceWayne: already done that, it's part of the instructions on the link I posted. — Simon, Mar 27 '17 at 17:36
@JacekLaskowski: if you read the output posted you will see that there is not error regarding winutils as I've already fixed that issue. — Simon, Mar 27 '17 at 17:39
Can you start Spark using `spark-shell` to confirm that `winutils` is not an issue? — Jacek Laskowski, Mar 27 '17 at 18:57

apache-spark - Error when starting pyspark on windows

0 Answers0