Questions tagged [sparklyr]

sparklyr is an alternative R interface for Apache Spark

sparklyr provides an alternative to interface for built on top of .

External links:

784 questions
0
votes
3 answers

Could not parse Master URL: 'spark.bluemix.net'

I'm trying to connect to IBM's Spark as a Service running on Bluemix from RStudio running on my desktop machine. I have copied the config.yml from the automatically configured RStudio environment running on IBM's Data Science Experience: default: …
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
0
votes
1 answer

RStudio/Sparklyr on MAPR/Spark - Replace , to . in string

I'm having a Spark dataframe tbl_pred with the folowing factor column: **Value** 13,3 11 5,3 I like to convert those 'strings' to numeric values. I can use the as.numeric function, but this doesn't work because my seperator is a comma. tbl_pred…
user3331966
  • 152
  • 2
  • 9
0
votes
0 answers

Change Class of Column to Date in sparklyr Spark DataFrame

I am working with sparklyr and am having trouble changing column classes along with using dplyr to aggregate the data. This is my code currently: .libPaths(c(.libPaths(), '/usr/lib/spark/R/lib')) Sys.setenv(SPARK_HOME =…
nak5120
  • 4,089
  • 4
  • 35
  • 94
0
votes
1 answer

Can Sparklyr be used on a local machine to get around R's memory limitations?

I need to fit GLMs on data that doesn't fit into my computer's memory. Usually to get around this issue, I would sample data, fit the model and then test on a different sample that would sit out of memory. This has been R's major limitation for me…
0
votes
1 answer

Fail to connect to Spark with sparklyr

I am trying to connect to spark using sparklyr package in R and I am getting the following error: library(sparklyr) > library(dplyr) > config <- spark_config() > config[["sparklyr.shell.conf"]] <-…
Rami Krispin
  • 79
  • 1
  • 6
0
votes
1 answer

Serialize SparkR DataFrame to jobj

I'd like to be able to use the Java methods on a SparkR SparkDataFrame to write data to Cassandra. Using the sparklyr extensions for example, I can do something like this: sparklyr::invoke(sparklyr::spark_dataframe(spark_tbl), "write") %>>%…
Akhil Nair
  • 3,144
  • 1
  • 17
  • 32
0
votes
0 answers

Dynamic mutate_each in dplyr

I have the following columns in my dataframe: c1_sum | c2_sum | d | c1 | c2 The columns c# and c#_sum are dynamic. I'm trying to do something like this for all c#: mutate(c#_weight = (d * c#) / c#_sum) The final result would be: c1_sum | c2_sum |…
Raphael Sampaio
  • 148
  • 2
  • 11
0
votes
1 answer

Reading graph from file

Looking to run a GraphX example on my Windows machine using Spark-Shell from SparklyR install of Hadoop/Spark. Am able to launch the shell from the install directory here first: start…
eyeOfTheStorm
  • 351
  • 1
  • 5
  • 15
0
votes
1 answer

Looking to sort a Spark Data Frame by Index using SparklyR

library(sparklyr) library(dplyr) library(Lahman) spark_install(version = "2.0.0") sc <- spark_connect(master = "local") batting_tbl <- copy_to(sc, Lahman::Batting, "batting"); batting_tbl batting_tbl %>% arrange(-index()) # Error:…
eyeOfTheStorm
  • 351
  • 1
  • 5
  • 15
0
votes
1 answer

Cannot load sql table to r through SparkR

I'm trying to load an SQL table in R through sparkR. I have the following code: Sys.setenv(SPARK_HOME = "C:/Users/hms/Desktop/spark-2.0.1-bin-hadoop2.7/spark-2.0.1-bin-hadoop2.7", HADOOP_HOME =…
hsilva
  • 175
  • 2
  • 14
0
votes
1 answer

is.na and quantile with sparklyr

I am using sparklyr and it seems to be working well. However, some of my former code will not be implemented. When is use complete.cases I get Error: org.apache.spark.sql.AnalysisException: undefined function COMPLETE.CASES I get the same…
Levi Brackman
  • 325
  • 2
  • 17
0
votes
1 answer

Error while connecting sparklyr to remote sparkR in Rstudio

I tried following command in my local RStudio session to connect to sparkR - sc <- spark_connect(master = "spark://x.x.x.x:7077", spark_home = "/home/hduser/spark-2.0.0-bin-hadoop2.7", version="2.0.0", config = list()) But, I am getting following…
r4sn4
  • 117
  • 5
  • 14
0
votes
4 answers

Connect R to Spark through sparklyr

I'm trying to connect R to Spark following the sparklyr tutorial from RStudio: http://spark.rstudio.com/ But some how, I'm getting a weird error message as below. Does anyone knows how to solve this ? I have tried to add the C:\Windows\system32 path…
user1514373
  • 1
  • 1
  • 1
-1
votes
1 answer

How to use dplyr in sparklyr

Hello I am just getting started using Sparklyr and I am getting an error when trying to use dplyr to wrangle some data. library(sparklyr) sc <- spark_connect(master = "local") spark_read_csv(sc, "df2_tbl", "C:/Users/...csv") spark_read_csv(sc,…
Kreitz Gigs
  • 369
  • 1
  • 9
-2
votes
1 answer

Running parallel function calls with sparklyr

Currently, I am using foreach loop from doparallel library to run function calls in parallel across multiple cores of the same machine, which looks something like this: out_results=foreach(i =1:length(some_list))%dopar% { …
1 2 3
52
53