Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a r package that provides a light-weight frontend to use apache-spark from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions

votes

1 answer

How to check for intersection of two DataFrame columns in Spark

Using either pyspark or sparkr (preferably both), how can I get the intersection of two DataFrame columns? For example, in sparkr I have the following DataFrames: newHires <- data.frame(name = c("Thomas", "George", "George", "John"), …

apache-spark pyspark sparkr

asked May 24 '17 at 21:00

Gaurav Bansal

5,221
14
45
91

votes

1 answer

Not able to retrieve data from SparkR created DataFrame

I have below simple SparkR program, which is to create a SparkR DataFrame and retrieve/collect data from it. Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf.cloudera.yarn") Sys.setenv(SPARK_HOME =…

r hadoop apache-spark hive sparkr

asked Jul 25 '16 at 21:47

Manu Batham

votes

2 answers

Using apply functions in SparkR

I am currently trying to implement some functions using sparkR version 1.5.1. I have seen older (version 1.3) examples, where people used the apply function on DataFrames, but it looks like this is no longer directly available. Example: x =…

sparkr

asked Oct 22 '15 at 16:32

bmcMunich

votes

4 answers

SparkR Error in sparkR.init(master="local") in RStudio

I have installed the SparkR package from Spark distribution into the R library. I can call the following command and it seems to work properly: library(SparkR) However, when I try to get the Spark context using the following code, sc <-…

apache-spark rstudio sparkr

asked Jul 09 '15 at 15:37

Umesh K

13,436
25
87
129

votes

3 answers

How to read csv into sparkR ver 1.4?

As a new version of spark (1.4) was released there appeared to be a nice frontend interfeace to spark from R package named sparkR. On the documentation page of R for spark there is a command that enables to read json files as an RDD objects people…

r csv apache-spark apache-spark-sql sparkr

asked Jul 03 '15 at 10:50

Marcin

7,834
8
52
99

votes

0 answers

Losing columns names when writing sparkdataframe with sparkR write.df

Context I'm working on an azure HDI R server cluster with rstudio and sparkR package. I'm reading file, modifying it and then i want to write it with write.df, but the problem is that when i write the file, my column names disappear. My code is the…

r azure hadoop apache-spark-sql sparkr

asked Mar 08 '18 at 13:42

Orhan Yazar

votes

1 answer

SparkR DataFrame partitioning issue

In my R script, I have a SparkDataFrame of two columns (time, value) containing data for four different months. Because of the fact that I need to apply my function to an each month separately, I figured I would repartition it into four partitions…

r apache-spark sparkr

asked Jan 26 '18 at 15:43

Kamil Potoczny

votes

1 answer

Is it possible to use data.table on SparkR with Sparkdataframes?

Situation I used to work on Rstudio with data.table instead of plyr or sqldf because it's really fast. Now, i'm working on sparkR on an azure cluster and i'd like to now if i can use data.table on my spark Data frames and if it's faster than sql ?

r apache-spark data.table cluster-computing sparkr

asked Nov 09 '17 at 12:35

Orhan Yazar

votes

2 answers

Getting last value of group in Spark

I have a SparkR DataFrame as shown below: #Create R data.frame custId <- c(rep(1001, 5), rep(1002, 3), 1003) date <- c('2013-08-01','2014-01-01','2014-02-01','2014-03-01','2014-04-01','2014-02-01','2014-03-01','2014-04-01','2014-04-01') desc <-…

apache-spark pyspark apache-spark-sql sparkr

asked Aug 17 '17 at 15:18

Gaurav Bansal

5,221
14
45
91

votes

3 answers

Get mode (most often) value in Spark column with groupBy

I have a SparkR DataFrame and I want to get the mode (most often) value for each unique name. How can I do this? There doesn't seem to be a built-in mode function. Either a SparkR or PySpark solution will do. # Create DF df <- data.frame(name =…

apache-spark pyspark apache-spark-sql mode sparkr

asked Jun 28 '17 at 15:25

Gaurav Bansal

5,221
14
45
91

votes

1 answer

how to list spark-packages added to the spark context?

Is it possible to list what spark packages have been added to the spark session? The class org.apache.spark.deploySparkSubmitArguments has a variable for the packages: var packages: String = null Assuming this is a list of the spark packages, is…

apache-spark sparkr

asked Feb 16 '17 at 16:33

Chris Snow

23,813
35
144
309

votes

1 answer

How to do map and reduce in SparkR

How do I do map and reduce operations using SparkR? All I can find is stuff about SQL queries. Is there a way to do map and reduce using SQL?

apache-spark sparkr

asked Jun 23 '15 at 20:22

Matthew Jones

votes

0 answers

SparkR code fails if Apache Arrow is enabled

I am running gapply function on SparkRDataframe which looks like below df<-gapply(sp_Stack, function(key,e) { Sys.setlocale('LC_COLLATE','C') suppressPackageStartupMessages({ library(Rcpp) library(Matrix) …

apache-spark google-cloud-dataproc sparkr apache-arrow

asked Jul 09 '21 at 08:45

Benak Raj

votes

2 answers

Efficient way to read and write data into files over a loop using R

I am trying to read and write data into files at each time step. To do this, I am using the package h5 to store large datasets but I find that my code using the functions of this package is running slowly. I am working with very large datasets. So,…

r performance sparkr

asked Aug 09 '19 at 23:48

Nell

votes

1 answer

Find variables making Primary Key using SparkR

Here is my toy data: df <- tibble::tribble( ~var1, ~var2, ~var3, ~var4, ~var5, ~var6, ~var7, "A", "C", 1L, 5L, "AA", "AB", 1L, "A", "C", 2L, 5L, "BB", "AC", 2L, "A", "D", 1L, 7L, "AA", "BC", 2L, …

r sparkr sparklyr

asked Nov 15 '18 at 19:16

Geet

2,515
2
19
42

Prev 1

…

53 54 Next