Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a package that provides a light-weight frontend to use from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions
4
votes
1 answer

Extracting Class Probabilities from SparkR ML Classification Functions

I'm wondering if it's possible (using the built in features of SparkR or any other workaround), to extract the class probabilities of some of the classification algorithms that included in SparkR. Particular ones of interest are. spark.gbt() …
user331137
  • 41
  • 1
4
votes
2 answers

Not able to to convert R data frame to Spark DataFrame

When I try to convert my local dataframe in R to Spark DataFrame using: raw.data <- as.DataFrame(sc,raw.data) I get this error: 17/01/24 08:02:04 WARN RBackendHandler: cannot find matching method class…
Abhishek Gupta
  • 77
  • 1
  • 2
  • 9
4
votes
1 answer

Install Spark on Windows for sparklyr

I have tried several tutorials on setting up Spark and Hadoop in a Windows environment, especially alongside R. This one resulted in this error by the time I hit figure 9: This tutorial from Rstudio is giving me issues as well. When I get to…
d8aninja
  • 3,233
  • 4
  • 36
  • 60
4
votes
1 answer

sparkr 2.0 read.df throws path does not exist error

My spark r 1.6 code does not work in spark2.0, I made necessary changes like creating sparkr.session() instead of sparkr.init() and not passing sqlcontext parameter etc… In the code below I am loading data from couple folders into a…
narik
  • 51
  • 3
4
votes
1 answer

Zeppelin R interpreter fails to do anything

I am running Zeppelin 0.6.1 and its sparkR interpreter fails to do anything. It says ERROR on cell execution status but does not tell what error. I used its binary package with all interpreters zeppelin-0.6.1-bin-all.tgz. Tried many things but no…
khrist safalhai
  • 560
  • 5
  • 19
4
votes
1 answer

How to update to SparkR 2.0.0 package in R

I want to update from SparkR 1.4.0 to SparkR 2.0.0, but I get the following error: had non-zero exit status This is because SparkR 2.0.0 is not available on CRAN. Similarly, from SparkR 1.6.2 to SparkR 2.0.0, we get: Warning in install.packages…
Sahil Desai
  • 3,418
  • 4
  • 20
  • 41
4
votes
0 answers

Spark R 2.0 dapply very slow

I just started testing Spark R 2.0, and find the execution of dapply very slow. For example, the following code set.seed(2) random_DF<-data.frame(matrix(rnorm(1000000),100000,10)) system.time(dummy_res<-random_DF[random_DF[,1]>1,]) user system…
4
votes
2 answers

Spark 2.0.0: SparkR CSV Import

I am trying to read a csv file into SparkR (running Spark 2.0.0) - & trying to experiment with the newly added features. Using RStudio here. I am getting an error while "reading" the source file. My code: Sys.setenv(SPARK_HOME =…
turnip424
  • 322
  • 6
  • 16
4
votes
2 answers

SparkR - Creating Test and Train DataFrames for Data Mining

I wish to partition a SparkR DataFrame into two subsets, one for training and one for testing of a glim. My normal way of doing this in R is to create an array index of the rows, sample the array into a new array, and then subset the data based on…
SpiritusPrana
  • 480
  • 3
  • 13
4
votes
1 answer

Empty output when reading a csv file into Rstudio using SparkR

I'm a new user of SparkR. I'm trying to load a csv file into R using SparkR. Sys.setenv(SPARK_HOME="/usr/local/bin/spark-1.5.1-bin-hadoop2.6") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR) sc <-…
Minnie
  • 43
  • 4
4
votes
1 answer

How to initialize a new Spark Context and executors number on YARN from RStudio

I am working with SparkR. I am able to set Spark Context on YARN with desired number of executors and executor-cores with such command: spark/bin/sparkR --master yarn-client --num-executors 5 --executor-cores 5 Now I am trying to initialize a new…
Marcin
  • 7,834
  • 8
  • 52
  • 99
4
votes
3 answers

How do I apply a function on each value of a column in a SPARKR DataFrame?

I am relatively new to SPARKR. I downloaded SPARK 1.4 and setup RStudio to use SPARKR library. However I want to know how I can apply a function to each value in a column of a distributed DataFrame, can someone please help? For example, This works…
Sagar
  • 41
  • 1
  • 2
4
votes
3 answers

Reading Text file in SparkR 1.4.0

Does anyone know how to read a text file in SparkR version 1.4.0? Are there any Spark packages available for that?
Edwin Vivek N
  • 564
  • 8
  • 28
4
votes
4 answers

Loading com.databricks.spark.csv via RStudio

I have installed Spark-1.4.0. I have also installed its R package SparkR and I am able to use it via Spark-shell and via RStudio, however, there is one difference I can not solve. When launching the SparkR-shell ./bin/sparkR --master local[7]…
Wannes Rosiers
  • 1,680
  • 1
  • 12
  • 18
3
votes
1 answer

Unable to subset the data using SparkR, using piping convention to execute the commands

I'm operating on some data that looks like below: dataFrame the command that I'm performing is : library(magrittr) #subsetting the data for MAC-OS & sorting by event-timestamp. macDF <- eventsDF %>% SparkR::select("device", "event_timestamp")…
Riyaz Ali
  • 43
  • 4