Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a r package that provides a light-weight frontend to use apache-spark from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions

votes

1 answer

Extracting Class Probabilities from SparkR ML Classification Functions

I'm wondering if it's possible (using the built in features of SparkR or any other workaround), to extract the class probabilities of some of the classification algorithms that included in SparkR. Particular ones of interest are. spark.gbt() …

r apache-spark machine-learning sparkr

asked Jan 30 '17 at 18:29

user331137

votes

2 answers

Not able to to convert R data frame to Spark DataFrame

When I try to convert my local dataframe in R to Spark DataFrame using: raw.data <- as.DataFrame(sc,raw.data) I get this error: 17/01/24 08:02:04 WARN RBackendHandler: cannot find matching method class…

r sparkr

asked Jan 24 '17 at 08:15

Abhishek Gupta

votes

1 answer

Install Spark on Windows for sparklyr

I have tried several tutorials on setting up Spark and Hadoop in a Windows environment, especially alongside R. This one resulted in this error by the time I hit figure 9: This tutorial from Rstudio is giving me issues as well. When I get to…

r hadoop apache-spark sparkr sparklyr

asked Nov 16 '16 at 21:08

d8aninja

3,233
4
36
60

votes

1 answer

sparkr 2.0 read.df throws path does not exist error

My spark r 1.6 code does not work in spark2.0, I made necessary changes like creating sparkr.session() instead of sparkr.init() and not passing sqlcontext parameter etc… In the code below I am loading data from couple folders into a…

sparkr apache-spark-2.0

asked Sep 25 '16 at 19:47

narik

votes

1 answer

Zeppelin R interpreter fails to do anything

I am running Zeppelin 0.6.1 and its sparkR interpreter fails to do anything. It says ERROR on cell execution status but does not tell what error. I used its binary package with all interpreters zeppelin-0.6.1-bin-all.tgz. Tried many things but no…

r apache-spark sparkr apache-zeppelin

asked Aug 30 '16 at 19:56

khrist safalhai

votes

1 answer

How to update to SparkR 2.0.0 package in R

I want to update from SparkR 1.4.0 to SparkR 2.0.0, but I get the following error: had non-zero exit status This is because SparkR 2.0.0 is not available on CRAN. Similarly, from SparkR 1.6.2 to SparkR 2.0.0, we get: Warning in install.packages…

r apache-spark sparkr

asked Aug 29 '16 at 09:02

Sahil Desai

3,418
4
20
41

votes

0 answers

Spark R 2.0 dapply very slow

I just started testing Spark R 2.0, and find the execution of dapply very slow. For example, the following code set.seed(2) random_DF<-data.frame(matrix(rnorm(1000000),100000,10)) system.time(dummy_res<-random_DF[random_DF[,1]>1,]) user system…

r apache-spark sparkr

asked Aug 06 '16 at 11:03

Yann-Aël Le Borgne

votes

2 answers

Spark 2.0.0: SparkR CSV Import

I am trying to read a csv file into SparkR (running Spark 2.0.0) - & trying to experiment with the newly added features. Using RStudio here. I am getting an error while "reading" the source file. My code: Sys.setenv(SPARK_HOME =…

csv apache-spark apache-spark-sql sparkr

asked Jul 29 '16 at 12:37

turnip424

votes

2 answers

SparkR - Creating Test and Train DataFrames for Data Mining

I wish to partition a SparkR DataFrame into two subsets, one for training and one for testing of a glim. My normal way of doing this in R is to create an array index of the rows, sample the array into a new array, and then subset the data based on…

apache-spark apache-spark-sql sparkr

asked Jun 04 '16 at 23:21

SpiritusPrana

votes

1 answer

Empty output when reading a csv file into Rstudio using SparkR

I'm a new user of SparkR. I'm trying to load a csv file into R using SparkR. Sys.setenv(SPARK_HOME="/usr/local/bin/spark-1.5.1-bin-hadoop2.6") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR) sc <-…

r csv apache-spark sparkr

asked Nov 03 '15 at 22:12

Minnie

votes

1 answer

How to initialize a new Spark Context and executors number on YARN from RStudio

I am working with SparkR. I am able to set Spark Context on YARN with desired number of executors and executor-cores with such command: spark/bin/sparkR --master yarn-client --num-executors 5 --executor-cores 5 Now I am trying to initialize a new…

r apache-spark rstudio rstudio-server sparkr

asked Sep 16 '15 at 13:48

Marcin

7,834
8
52
99

votes

3 answers

How do I apply a function on each value of a column in a SPARKR DataFrame?

I am relatively new to SPARKR. I downloaded SPARK 1.4 and setup RStudio to use SPARKR library. However I want to know how I can apply a function to each value in a column of a distributed DataFrame, can someone please help? For example, This works…

r sparkr

asked Aug 12 '15 at 10:06

Sagar

votes

3 answers

Reading Text file in SparkR 1.4.0

Does anyone know how to read a text file in SparkR version 1.4.0? Are there any Spark packages available for that?

r apache-spark sparkr

asked Jul 01 '15 at 09:33

Edwin Vivek N

votes

4 answers

Loading com.databricks.spark.csv via RStudio

I have installed Spark-1.4.0. I have also installed its R package SparkR and I am able to use it via Spark-shell and via RStudio, however, there is one difference I can not solve. When launching the SparkR-shell ./bin/sparkR --master local[7]…

rstudio sparkr

asked Jun 16 '15 at 14:21

Wannes Rosiers

1,680
1
12
18

votes

1 answer

Unable to subset the data using SparkR, using piping convention to execute the commands

I'm operating on some data that looks like below: dataFrame the command that I'm performing is : library(magrittr) #subsetting the data for MAC-OS & sorting by event-timestamp. macDF <- eventsDF %>% SparkR::select("device", "event_timestamp")…

apache-spark databricks sparkr

asked Jul 26 '21 at 11:07

Riyaz Ali

Prev 1 2 3

…

53 54 Next