Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a package that provides a light-weight frontend to use from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions
5
votes
1 answer

Run a R Model using SparkR

Thanks in advance for your input. I am a newbie to ML. I've developed a R model (using R studio on my local) and want to deploy on the hadoop cluster having R Studio installed. I want to use SparkR to leverage high performance computing. I just want…
Suri
  • 229
  • 2
  • 15
5
votes
1 answer

How can one debug/get logs for failures of the SparkR Java backend?

I'm bedeviled by a No status is returned. Java SparkR backend might have failed. error when fitting a glm using Spark. The job actually appears to run to completion based on the Spark web ui, but at some point during model fit (it doesn't appear to…
russellpierce
  • 4,583
  • 2
  • 32
  • 44
5
votes
1 answer

Difference between collect and as.data.frame in sparkR

What is the difference between as.data.frame() and collect(), when heaving a DataFrame object into local memory?
5
votes
1 answer

socketConnection error in sparkR

I am very new to SparkR. When I ran sparkR, something was wrong. sc <- sparkR.init(master="local") error like this: Error in socketConnection(port = monitorPort) : cannot open the connection In addition: Warning message: In socketConnection(port…
wesson.Gan
  • 51
  • 1
5
votes
1 answer

SparkR bottleneck in createDataFrame?

I'm new to Spark, SparkR and generally all HDFS-related technologies. I've installed recently Spark 1.5.0 and run some simple code with…
5
votes
3 answers

How to load csv file into SparkR on RStudio?

How do you load csv file into SparkR on RStudio? Below are the steps I had to perform to run SparkR on RStudio. I have used read.df to read .csv not sure how else to write this. Not sure if this step is considered to create RDDs. #Set sys…
sharp
  • 2,140
  • 9
  • 43
  • 80
5
votes
2 answers

Options to read large files (pure text, xml, json, csv) from hdfs in RStudio with SparkR 1.5

I am new to Spark and would like to know if there are other options than those ones below to read data stored in a hdfs from RStudio using SparkR or if I use them correctly. The data could be any kind (pure text, csv, json, xml or any database…
4711
  • 61
  • 4
5
votes
2 answers

Efficiently Aggregate Many CSVs in Spark

Pardon my simple question but I'm relatively new to Spark/Hadoop. I'm trying to load a bunch of small CSV files into Apache Spark. They're currently stored in S3, but I can download them locally if that simplifies things. My goal is to do this as…
Jeff Allen
  • 17,277
  • 8
  • 49
  • 70
5
votes
0 answers

Save sparkR dataframe with HiveContext using saveAsTable command

How to save sparkR data frame when working with HiveContext using saveAsTable command df_5 <- loadDF(sqlContext, "Report02_cashier_Hourly_total_Trans_july30.parquet", "parquet") /*I loaded the parquet file as dataframe*/ sqlContext <-…
Arun Gunalan
  • 814
  • 7
  • 26
4
votes
0 answers

How to configure Spark / Databricks memory to collect large R data.frame?

Out of memory issues caused by collecting spark DataFrame into R data.frame has been discussed here several times (e.g. here or here). However, none answer seems to be usable in my environment. Problem: I'm trying to collect some transactional data…
Dan
  • 494
  • 2
  • 14
4
votes
2 answers

sparkr databricks error: too many open devices

I was using simple data manipulation using sparkr on Databricks. The code was working just fine a minute ago and suddenly I started getting the following error: Error in png(fileName, width = plotWidth, height = plotHeight, pointsize =…
Geet
  • 2,515
  • 2
  • 19
  • 42
4
votes
1 answer

Unnest (seperate) multiple column values into new rows using Sparklyr

I am trying to split column values separated by comma(,) into new rows based on id's. I know how to do this in R using dplyr and tidyr. But I am looking to solve same problem in sparklyr. id <- c(1,1,1,1,1,2,2,2,3,3,3) name <-…
Rushabh Patel
  • 2,672
  • 13
  • 34
4
votes
1 answer

How to run SparkR script using spark-submit or sparkR on an EMR cluster?

I have written a sparkR code and wondering if I can submit it using spark-submit or sparkR on an EMR cluster. I have tried several ways for example: sparkR mySparkRScript.r or sparkR --no-save mySparkScript.r etc.. but every time I am getting below…
4
votes
1 answer

SparkR Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'

I am installing SparkR in my Windows 8.1 from this tutorial https://www.linkedin.com/pulse/setting-up-sparkr-windows-machine-ramabhadran-kapistalam. I ended it so I guess it's well implemented. The problem is when I try to run an example with a…
4
votes
0 answers

matrix operation in sparkR

For creating matrix, the spark documentation given 4 data types 1. RowMatrix 2. IndexedRowMatrix 3. CoordinateMatrix and 4. Block matrix. but all the data types are explained in scala, java and python. I want to perform matrix operation in spark…
Siddhu
  • 1,188
  • 2
  • 14
  • 24
1 2
3
53 54