3

I am running sparkR 2.0.0 from the terminal, and I can run R commands. However, how do I create a .r script and be able to run in it within the spark session.

Jonathan
  • 611
  • 2
  • 7
  • 15

1 Answers1

8

SparkR uses standard R interpreter so the same rules apply. If you want to execute external script inside current session use source function.

## Welcome to
##    ____              __ 
##   / __/__  ___ _____/ /__ 
##  _\ \/ _ \/ _ `/ __/  '_/ 
## /___/ .__/\_,_/_/ /_/\_\   version  2.1.0-SNAPSHOT 
##    /_/ 
##
##
## SparkSession available as 'spark'.
> sink("test.R")
> cat("print(head(createDataFrame(mtcars)))")
> sink()
> source("test.R")
##    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

If you want to submit a standalone script outside existing SparkR session you should initialize required context in the script itself. After that you can execute it using SPARK_HOME/bin/spark-submit (preferred option) or even Rscript.

zero323
  • 322,348
  • 103
  • 959
  • 935
  • Hi, thanks for making stack overflow a great place! Would you happen to know any documentation to best learn sparkR? – Jonathan Sep 21 '16 at 20:20
  • Not really but excluding a few new additions (`*apply` methods) you can use any Spark SQL guide. – zero323 Sep 21 '16 at 20:39