I am running sparkR 2.0.0 from the terminal, and I can run R commands. However, how do I create a .r script and be able to run in it within the spark session.
Asked
Active
Viewed 4,022 times
1 Answers
8
SparkR uses standard R interpreter so the same rules apply. If you want to execute external script inside current session use source
function.
## Welcome to
## ____ __
## / __/__ ___ _____/ /__
## _\ \/ _ \/ _ `/ __/ '_/
## /___/ .__/\_,_/_/ /_/\_\ version 2.1.0-SNAPSHOT
## /_/
##
##
## SparkSession available as 'spark'.
> sink("test.R")
> cat("print(head(createDataFrame(mtcars)))")
> sink()
> source("test.R")
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If you want to submit a standalone script outside existing SparkR session you should initialize required context in the script itself. After that you can execute it using SPARK_HOME/bin/spark-submit
(preferred option) or even Rscript
.

zero323
- 322,348
- 103
- 959
- 935
-
Hi, thanks for making stack overflow a great place! Would you happen to know any documentation to best learn sparkR? – Jonathan Sep 21 '16 at 20:20
-
Not really but excluding a few new additions (`*apply` methods) you can use any Spark SQL guide. – zero323 Sep 21 '16 at 20:39