1

Chapter 6: "Clusters" from the book "mastering spark with R", shows how to start the master node of a Spark standalone cluster from R:

# Retrieve the Spark installation directory
spark_home <- spark_home_dir()
# Build paths and classes
spark_path <- file.path(spark_home, "bin", "spark-class")
# Start cluster manager master node #ko on windows ?!
system2(spark_path, "org.apache.spark.deploy.master.Master", wait = FALSE)

unlike this example, showing that the last command is working on linux, on my windows machine the command run with no warning but without any effect: nothing is printed out in the console, and no web interface is served at http://127.0.0.1:8080/

I could but succesfully start a master node from a windows cmd prompt like this:

set SPARH_HOME=C:\Users\username\AppData\Local\spark\spark-2.4.3-bin-hadoop2.7
cmd /k %SPARH_HOME%\conf\spark-env.cmd
spark-class org.apache.spark.deploy.master.Master

and add some worker node with

spark-class org.apache.spark.deploy.worker.Worker -i 10.0.75.1 -p 7077

but then when connecting to the master from R, with :

sc <- spark_connect(spark_home = spark_install_find(version="2.4.3")$sparkVersionDir,master = "spark://localhost:7077")
if(file.exists("codes_ages")) unlink("codes_ages", TRUE)
codes.ages.df <- spark_read_csv(sc,name = "codes_ages", path = paste0(datadir,"/age.txt"), header = FALSE, delimiter = " ")

either the first or the third command never lasts ... (datadir is set and age.txt is made of about 20 short lines.)

So how to start the master from R or how to conect to a (local) master node ?

user1767316
  • 3,276
  • 3
  • 37
  • 46

0 Answers0