Install Spark on Windows for sparklyr

Question

I have tried several tutorials on setting up Spark and Hadoop in a Windows environment, especially alongside R. This one resulted in this error by the time I hit figure 9:

This tutorial from Rstudio is giving me issues as well. When I get to the

sc <- spark_connect(master = "local")

step, I get this familiar error:

Error in force(code) : 
  Failed while connecting to sparklyr to port (8880) for sessionid (1652): Gateway in port (8880) did not respond.
    Path: C:\Users\jvangeete\spark-2.0.2-bin-hadoop2.7\bin\spark-submit2.cmd
    Parameters: --class, sparklyr.Backend, "C:\Users\jvangeete\Documents\R\win-library\3.3\sparklyr\java\sparklyr-2.0-2.11.jar", 8880, 1652


---- Output Log ----
The system cannot find the path specified.

---- Error Log ----

This port issue is similar to the one I get when trying to assign the "yarn-client" parameter inside spark_connect(...) as well, when trying it from Ms. Zaidi's tutorial, here. (That tutorial has its own issues, which I've put up on a board, here, if anyone's interested.)

The TutorialsPoint walkthrough gets me through fine if I first install an Ubuntu VM, but I'm using Microsoft R(RO) so I'd like to figure this out in Windows, not least of all because it appears that Mr. Emaasit is in the first tutorial able to run a command I cannot with .\bin\sparkR.

Most generally I am trying to understand how to install and run Spark together with R using preferably sparklyr, in Windows.

UPDATE 1: This is what's in the directories:

UPDATE 2: This is my R-session and system info

platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          3.1                         
year           2016                        
month          06                          
day            21                          
svn rev        70800                       
language       R                           
version.string R version 3.3.1 (2016-06-21)
nickname       Bug in Your Hair

Where is your spark installed? Your picture of an error references the directory `C:\Apache\Spark-2.0.2`. Does it actually exist at the location? Or is it installed somewhere else? Does it have `bin\sparkR` in it? Your other code references a different file location, `C:\Users\jvangeete\spark-2.0.2-bin-hadoop2.7`, is your installation there instead? Is the library filepath correct in the `Parameters` line? — Gregor Thomas, Nov 16 '16 at 21:20
Following along with that first tutorial, I simply extracted to that directory. There is a `bin\sparkR` in there, yes, along with `spark-shell` which doesn't run either, with the same error. I will update the post with a picture of what is in the directory @Gregor — d8aninja, Nov 16 '16 at 21:22
@Gregor to be clear, the other file location in the R code is referenced as a result of where the call to `install.packages("sparklyr")` put the files. There is ALSO `bin/sparkR` and `bin/spark-shell` in that directory. — d8aninja, Nov 16 '16 at 21:31

score 1 · Accepted Answer · edited May 11 '17 at 05:56

Download spark_hadoop tar from http://spark.apache.org/downloads.html
install sparklyr package from carn
spark_install_tar(tarfile = "path/to/spark_hadoop.tar")

If you still getting error, then untar the tar manually and set spark_home environment variable points to spark_hadoop untar path.

Then try executing the following in the R console. library(sparklyr) sc <- spark_connect(master = "local").

Install Spark on Windows for sparklyr

UPDATE 1: This is what's in the directories:

UPDATE 2: This is my R-session and system info

1 Answers1