Configuring Spark to run locally in windows 7

Question

I’m trying to use Spark in my desktop computer, which runs windows 7 (locally, not from a cluster or anything, in order to get some practice with it), through pySpark in an iPython notebook and found out a package called ‘findspark’ (which is available on pip) which can be used to avoid having to go through the setup of Spark.

Basically, I just download a spark version pre-built for hadoop from the official, decompress the file and then run something like this in python:

import findspark
findspark.init(‘spark_directory’)
import pyspark
sc = pyspark.SparkContext()

and I get a fully working spark context that works correctly, without setting up anything. However, it runs quite slowly, to the point that if I run something like:

print(sc.parallelize([1]).collect())

it takes over a second to produce the results, and if I try more expensive computations, it’s also quite slow and the RAM memory usage is limited (i.e. it doesn’t exceed a certain point even if the computation requires it) – for comparison purposes, I also ran it from an already-setup linux virtual machine that I downloaded in a MOOC and all operations run a lot faster.

I was wondering what can I do or what can I configure to speed it up. My aim is to have a functional instance of spark in my local machine to practice with pyspark, in an ipython notebook.

Did you take a look [here](http://stackoverflow.com/q/33326749/3415409)? — eliasah, Nov 09 '15 at 07:44
Thanks for the link, but it deals with how to get pyspark running in Eclipse (from what I understood, it had to do with setting up an environmental variable pointing to the spark directory). What I meant to ask was more in the lines of what could I tweak/configure/setup in order for it not to run slowly and not to have a memory usage limit. — anymous_asker, Nov 09 '15 at 07:54

Configuring Spark to run locally in windows 7

0 Answers0