I'm super new to spark, so my issues might have a "no duh" answer that I can't quite grasp.
Firstly, I downloaded spark 1.5.2 and extracted it. In the python folder, I tried to run pyspark, but it said something along the lines that it needs a main.py, so I copied init.py to main.py and started getting weird syntax errors. I realized I was using python 2.9, so I switched to 2.7 and got a different error:
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\spark-1.5.2\python\pyspark\__main__.py", line 40, in <module>
from pyspark.conf import SparkConf
ImportError: No module named pyspark.conf
I found this question that looked like the same error here: What to set `SPARK_HOME` to?
So I set up my environment variables as they did (except with C:/spark-1.5.2 instead of C:/spark), but that didn't fix the error for me. Then I realized they were using spark 1.4 from github. So I made a new folder and tried it as they did. I got stuck with the command:
build/mvn -DskipTests clean package
showing the error:
Java HotSpot(TM) Client VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Error occurred during initialization of VM
Could not reserve enough space for 2097152KB object heap
I tried adding "-XX:MaxHeapSize=3g" but no change. Noting the comment "support was removed in 8.0", I downloaded java 7, but that didn't change anything either.
Thanks in advance