I'm trying to experiment with MLlib on windows with python. So it seems I need SPARK which in turn needs HADOOP. I've installed Anaconda2 which contains python 2.7, numpy, etc.
I've been following this recipe which seems to me mostly getting me where I need to go, but I think I'm stuck on this last error:
Python 2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 19 2016, 13:29:36) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "C:\spark\bin\..\python\pyspark\shell.py", line 43, in <module>
spark = SparkSession.builder\
File "C:\spark\python\pyspark\sql\session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "C:\spark\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
File "C:\spark\python\pyspark\sql\utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
From this output it is clear to see that there is not error regarding winutils.exe not being found.
Also, the exception is originating in the java domain of py4j, but we've lost the back-trace thanks to the IllegalArgumentException.
All guidance appreciated!
Cheers