Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a r package that provides a light-weight frontend to use apache-spark from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions

votes

1 answer

Run a R Model using SparkR

Thanks in advance for your input. I am a newbie to ML. I've developed a R model (using R studio on my local) and want to deploy on the hadoop cluster having R Studio installed. I want to use SparkR to leverage high performance computing. I just want…

r apache-spark-mllib sparkr

asked Nov 14 '17 at 08:40

Suri

votes

1 answer

How can one debug/get logs for failures of the SparkR Java backend?

I'm bedeviled by a No status is returned. Java SparkR backend might have failed. error when fitting a glm using Spark. The job actually appears to run to completion based on the Spark web ui, but at some point during model fit (it doesn't appear to…

apache-spark glm sparkr

asked Sep 15 '16 at 08:22

russellpierce

4,583
2
32
44

votes

1 answer

Difference between collect and as.data.frame in sparkR

What is the difference between as.data.frame() and collect(), when heaving a DataFrame object into local memory?

r apache-spark sparkr

asked Jul 07 '16 at 16:02

PeterPancake

votes

1 answer

socketConnection error in sparkR

I am very new to SparkR. When I ran sparkR, something was wrong. sc <- sparkR.init(master="local") error like this: Error in socketConnection(port = monitorPort) : cannot open the connection In addition: Warning message: In socketConnection(port…

r sparkr

asked Oct 07 '15 at 01:29

wesson.Gan

votes

1 answer

SparkR bottleneck in createDataFrame?

I'm new to Spark, SparkR and generally all HDFS-related technologies. I've installed recently Spark 1.5.0 and run some simple code with…

r apache-spark sparkr

asked Oct 01 '15 at 12:55

Krzysztof Jędrzejewski

votes

3 answers

How to load csv file into SparkR on RStudio?

How do you load csv file into SparkR on RStudio? Below are the steps I had to perform to run SparkR on RStudio. I have used read.df to read .csv not sure how else to write this. Not sure if this step is considered to create RDDs. #Set sys…

r apache-spark apache-spark-sql sparkr

asked Sep 30 '15 at 18:43

sharp

2,140
9
43
80

votes

2 answers

Options to read large files (pure text, xml, json, csv) from hdfs in RStudio with SparkR 1.5

I am new to Spark and would like to know if there are other options than those ones below to read data stored in a hdfs from RStudio using SparkR or if I use them correctly. The data could be any kind (pure text, csv, json, xml or any database…

r sparkr apache-spark-1.5

asked Sep 15 '15 at 12:07

4711

votes

2 answers

Efficiently Aggregate Many CSVs in Spark

Pardon my simple question but I'm relatively new to Spark/Hadoop. I'm trying to load a bunch of small CSV files into Apache Spark. They're currently stored in S3, but I can download them locally if that simplifies things. My goal is to do this as…

csv amazon-s3 apache-spark sparkr

asked Aug 03 '15 at 20:01

Jeff Allen

17,277
8
49
70

votes

0 answers

Save sparkR dataframe with HiveContext using saveAsTable command

How to save sparkR data frame when working with HiveContext using saveAsTable command df_5 <- loadDF(sqlContext, "Report02_cashier_Hourly_total_Trans_july30.parquet", "parquet") /*I loaded the parquet file as dataframe*/ sqlContext <-…

apache-spark dataframe apache-spark-sql sparkr

asked Aug 03 '15 at 06:04

Arun Gunalan

votes

0 answers

How to configure Spark / Databricks memory to collect large R data.frame?

Out of memory issues caused by collecting spark DataFrame into R data.frame has been discussed here several times (e.g. here or here). However, none answer seems to be usable in my environment. Problem: I'm trying to collect some transactional data…

r apache-spark sparkr

asked Aug 25 '20 at 22:50

Dan

votes

2 answers

sparkr databricks error: too many open devices

I was using simple data manipulation using sparkr on Databricks. The code was working just fine a minute ago and suddenly I started getting the following error: Error in png(fileName, width = plotWidth, height = plotHeight, pointsize =…

r apache-spark sparkr databricks

asked Jul 23 '18 at 23:18

Geet

2,515
2
19
42

votes

1 answer

Unnest (seperate) multiple column values into new rows using Sparklyr

I am trying to split column values separated by comma(,) into new rows based on id's. I know how to do this in R using dplyr and tidyr. But I am looking to solve same problem in sparklyr. id <- c(1,1,1,1,1,2,2,2,3,3,3) name <-…

r apache-spark dplyr sparkr sparklyr

asked Feb 08 '18 at 16:06

Rushabh Patel

2,672
13
34

votes

1 answer

How to run SparkR script using spark-submit or sparkR on an EMR cluster?

I have written a sparkR code and wondering if I can submit it using spark-submit or sparkR on an EMR cluster. I have tried several ways for example: sparkR mySparkRScript.r or sparkR --no-save mySparkScript.r etc.. but every time I am getting below…

r apache-spark emr sparkr spark-submit

asked Nov 09 '17 at 09:48

Shashwat Shekhar Shukla

votes

1 answer

SparkR Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'

I am installing SparkR in my Windows 8.1 from this tutorial https://www.linkedin.com/pulse/setting-up-sparkr-windows-machine-ramabhadran-kapistalam. I ended it so I guess it's well implemented. The problem is when I try to run an example with a…

r apache-spark sparkr

asked May 04 '17 at 14:57

Marcelo Pires

votes

0 answers

matrix operation in sparkR

For creating matrix, the spark documentation given 4 data types 1. RowMatrix 2. IndexedRowMatrix 3. CoordinateMatrix and 4. Block matrix. but all the data types are explained in scala, java and python. I want to perform matrix operation in spark…

r apache-spark matrix sparkr

asked Apr 20 '17 at 11:38

Siddhu

1,188
2
14
24

Prev 1 2

…

53 54 Next