may I ask you have you actually installed the spark into that folder?
Can you show the result of ls
command in /home/ubuntu/
folder?
And sessionInfo()
in R?
Let me please share with you how I am using the custom folder structure.
It is on Win, not Ubuntu but I guess it won't make much of the difference.
Using the most recent dev edition
If you would check on GitHub the RStudio guys are updating sparklyr almost every day fixing numerous reported bugs:
devtools::install_github("rstudio/sparklyr")
in my case only installation of sparklyr_0.4.12
has resolved problem with Spark 2.0 under Windows
Checking Spark availability
please check if version you're inquiring is available:
spark_available_versions()
You should see something like the line below, which indicates that the version you indend to use is actually available for your sparklyr package.
[13] 2.0.0 2.7 spark_install(version = "2.0.0", hadoop_version = "2.7")
Installation of Spark
Just to keep the order you may like to install spark in other location rather then home folder of RStudio cache.
options(spark.install.dir = "c:/spark")
Once you are sure the desire version is available it is time to install spark
spark_install(version = "2.0.0", hadoop_version = "2.7")
I'd check if it is install correctly (change it for shell ls
if needed)
cd c:/spark
dir (in Win) | ls (in Ubuntu)
Now specify the location of the edition you want to use:
Sys.setenv(SPARK_HOME = 'C:/spark/spark-2.0.0-bin-hadoop2.7')
And finally enjoy the creation of connection:
sc <- spark_connect(master = "local")
I hope it helps.