I'm trying to connect my RStudio Server to my DSE Analytics cluster.
The setup:
- CentOS 7
- openjdk-1.8
- RStudio Server v1.0.136 (with latest version of sparklyr by
>devtools::install_github("rstudio/sparklyr")
) - DSE 5.0 (spark 1.6.2)
- 5 nodes of DSE Analytics in a DC within a cluster (shared by another DC for OLTP)
- RStudio Server running DSE Analytics stand alone (VM)
Since, unlike the sparklyr tutorial, I'm bringing my own (DSE's) Spark. SPARK_HOME
was not set. Nor was JAVA_HOME
. So:
> Sys.setenv(JAVA_HOME = '/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64')
> Sys.setenv(SPARK_HOME = '/usr/share/dse/spark/')
My config.yml (found the exaple here):
spark.cassandra.connection.host: <IP of one node>
spark.cassandra.auth.username: cassandra
spark.cassandra.auth.password: <PW>
sparklyr.defaultPackages:
- com.databricks:spark-csv_2.11:1.3.0
- com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M1
- com.datastax.cassandra:cassandra-driver-core:3.0.2
My session info:
> devtools::session_info()
Session info --------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, linux-gnu
ui RStudio (1.0.136)
language (EN)
collate en_US.UTF-8
tz America/Mexico_City
date 2017-02-02
Packages ----------------------------------------
package * version date source
assertthat 0.1 2013-12-06 CRAN (R 3.3.2)
backports 1.0.5 2017-01-18 CRAN (R 3.3.2)
base64enc 0.1-3 2015-07-28 CRAN (R 3.3.2)
config 0.2 2016-08-02 CRAN (R 3.3.2)
curl 2.3 2016-11-24 CRAN (R 3.3.2)
DBI 0.5-1 2016-09-10 CRAN (R 3.3.2)
devtools 1.12.0 2016-12-05 CRAN (R 3.3.2)
digest 0.6.12 2017-01-27 CRAN (R 3.3.2)
dplyr 0.5.0 2016-06-24 CRAN (R 3.3.2)
git2r 0.18.0 2017-01-01 CRAN (R 3.3.2)
htmltools 0.3.5 2016-03-21 cran (@0.3.5)
httpuv 1.3.3 2015-08-04 cran (@1.3.3)
httr 1.2.1 2016-07-03 CRAN (R 3.3.2)
jsonlite 1.2 2016-12-31 CRAN (R 3.3.2)
magrittr 1.5 2014-11-22 CRAN (R 3.3.2)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.2)
mime 0.5 2016-07-07 CRAN (R 3.3.2)
packrat 0.4.8-1 2016-09-07 CRAN (R 3.3.2)
R6 2.2.0 2016-10-05 CRAN (R 3.3.2)
Rcpp 0.12.9 2017-01-14 CRAN (R 3.3.2)
rprojroot 1.2 2017-01-16 CRAN (R 3.3.2)
rstudioapi 0.6 2016-06-27 CRAN (R 3.3.2)
shiny 1.0.0 2017-01-12 cran (@1.0.0)
sparklyr * 0.5.3-9000 2017-02-02 Github (rstudio/sparklyr@bd4aee0)
tibble 1.2 2016-08-26 CRAN (R 3.3.2)
withr 1.0.2 2016-06-20 CRAN (R 3.3.2)
xtable 1.8-2 2016-02-05 cran (@1.8-2)
yaml 2.1.14 2016-11-12 CRAN (R 3.3.2)
Now, when I try to generate the spark context, this is what I get:
> sc <- spark_connect(master = "spark://<IP of one node>", config = spark_config(file = "config.yml"), version = "1.6.2")
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (646): Gateway in port (8880) did not respond.
Path: /usr/share/dse/spark/bin/spark-submit
Parameters: --class, sparklyr.Backend, --jars, '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/spark-csv_2.11-1.3.0.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/commons-csv-1.1.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/univocity-parsers-1.5.1.jar', '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 646
---- Output Log ----
Failed to find Spark assembly in /usr/share/dse/spark/lib.
You need to build Spark before running this program.
---- Error Log ----
From this output, my guess is that sparklyr is not recognizing the spark of DSE Analytics
. As I understand it, DSE's spark it's deeply integrated with Cassandra with its connector, it even has its own dse spark-submit
. I'm sure I'm passing the wrong configs to sparklyr. I'm just lost as what to pass to it. Any help is welcome. Thank you.
Edit: I obviously hit the same error with > sc <- spark_connect(master="local")