Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a package that provides a light-weight frontend to use from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions
0
votes
1 answer

Connecting SparkR to the spark cluster

I have a spark cluster running on 10 machines (1 - 10) with the master at machine 1. All of these run on CentOS 6.4. I am trying to connect a jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on…
user3612324
  • 57
  • 1
  • 7
0
votes
2 answers

SparkR - Generate quantiles from a dataframe column (numeric type)

I'm exploring SparkR to compute quantiles of a numeric column in a CSV file (located in S3). I'm able to parse the CSV file and print the documents and access the column. But not sure how to generate quantiles. Any help would be appreciated. PS: R…
devsathish
  • 2,339
  • 2
  • 20
  • 16
0
votes
1 answer

hdfs: no such file or directory error when reading parquetfile in sparkR shell

I want to read parquetFile in sparkR shell from the hdfs system. So I do the that: ./sparkR --master yarn-client sqlContext <- sparkRSQL.init(sc) path<-"hdfs://year=2015/month=1/day=9" AppDF <- parquetFile(sqlContext, path) Error: No such file…
ysfseu
  • 666
  • 1
  • 10
  • 20
0
votes
1 answer

sparkr on ec2 : ensure that workers are registered and have sufficient memory

I set up a spark (spark-1.4.0) cluster on ec2 using the spark-ec2 script that comes with the release. It starts up fine with the master and one slave and I am able to check the status on http://:8080 Now I'd like to run sparkR on my cluster, this…
hadron
  • 457
  • 4
  • 19
0
votes
1 answer

Aggregation statistics in sparkR 1.4.0

I am a regular R user. For a data.frame that looks like the one below I would like to count basic aggregation statistics; minimum, 1st quantile, median, 3rd quantile and maximum. The following code using reshape2 package and dplyr to proceed with…
Marcin
  • 7,834
  • 8
  • 52
  • 99
0
votes
1 answer

sparkR Installation Issue 1.4.1

I have tried the following for both spark 1.4.0 and 1.4.1 on a Mac. I am downloading the package type = 'Source Code [can build several Hadoop versions' and download type http://ftp.wayne.edu/apache/spark/spark-1.4.1/spark-1.4.1.tgz. When I run…
Danny M.
  • 281
  • 1
  • 12
0
votes
1 answer

cannot create root directory in sparkR on AWS

making my first steps connecting sparkR to AWS cluster, I come across a problem: I cannot create sparkcontext ('sc') in Rstudio - > .libPaths( c( .libPaths(), '/root/spark/R/lib') ) > Sys.setenv(SPARK_HOME = '/root/spark') > Sys.setenv(PATH =…
Zahiro Mor
  • 1,708
  • 1
  • 16
  • 30
0
votes
1 answer

SparkR sql Context Error after initiating spark R Context job

I have installed the sparkR package and I am able to run other computation jobs like pi count or numbers of word counts in a document .But when I am trying to initiate sparkRSql job,it gives an error .Can anyone help me out ? I am using R version…
user459
  • 111
  • 8
0
votes
1 answer

Error with sparkR installation for R

I am trying to install the SparkR Package on my Windows 7 R Studio Version. So far i get the newest version of R (3.2.0). R Studio (0.98.1103). After that i look at different sources to get an idea about how to install SparkR. Afterwards i first…
Patrick C.
  • 2,221
  • 1
  • 11
  • 15
0
votes
1 answer

How to row bind two data frames in SparkR

In R we use rbind() to bind two data frames eg.) rbind(X , Y) How can we do the same in SparkR in spark 1.4 TIA, Arun
Arun Gunalan
  • 814
  • 7
  • 26
0
votes
1 answer

Error while Using a function of an existing package in R with SparkR

So, I installed SparkR using the steps given on the link: "https://amplab-extras.github.io/SparkR-pkg/". And I have installed it from Github repository using the direct command given on the website. Now, here goes my code: library(SparkR) sc <-…
John Lui
  • 1,434
  • 3
  • 23
  • 37
0
votes
1 answer

sparkR 1.4.0 : how to include jars

I'm trying to hook SparkR 1.4.0 up to Elasticsearch using the elasticsearch-hadoop-2.1.0.rc1.jar jar file (found here). It's requiring a bit of hacking together, calling the SparkR:::callJMethod function. I need to get a jobj R object for a couple…
0
votes
1 answer

Unable to launch sparkR shell in spark-1.4.0

I downloaded Spark-1.4.0 today and tried to launch the sparkR shell both in Linux and Windows environments - the command sparkR from the bin directory is not working. Anyone successfully launched the sparkR shell, pls. let me know. Thanks Sanjay
0
votes
2 answers

Error while installing SparkR package using install_github

I am trying to use the SparkR package in R. I have all dependent packages like devtools, Rtools.exe, etc. When I try the following command: install_github("amplab-extras/SparkR-pkg",subdir="pkg") I get the following error: Downloading github repo…
Umesh K
  • 13,436
  • 25
  • 87
  • 129
-1
votes
1 answer

Can we connect to databricks remotely using REST APIs (Without databricks connect)

i want to connect to databricks from a remote server using R studio, can we do it without databricks connect?... is it possible to connect using databricks REST APIs and use sparkR/sparklyR operations after connecting?
alpha123
  • 3
  • 1