Highest Voted 'sparklyr' Questions

0

votes

1 answer

Overwrite a Spark DataFrame into location

I want to save my Spark DataFrame into directory using spark_write_* function like this: spark_write_csv(df, "file:///home/me/dir/") but if the directory is already there I will get error: ERROR: org.apache.spark.sql.AnalysisException: path…

r sparklyr

asked Nov 15 '17 at 23:24

bartektartanus

15,284
6
74
102

0

votes

0 answers

Sparklyr ml_linear_regression Error: Invalid method setForceIndexLabel

I am trying to perform linear regression using SparklyR on an EMR cluster, and receiving the error below. The connection to Spark seems fine, and I have tried using several different datasets, but they all result in the same error. I am looking…

r apache-spark dplyr sparklyr

asked Nov 08 '17 at 20:55

Joseph Steinke

1

0

votes

1 answer

r sparklyr spark_apply Error: org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous

I am trying to use spark_apply on a spark cluster to calculate kmeans on data grouped by two columns. The data is queried from Hive and looks like this > samplog1 # Source: lazy query [?? x 6] # Database: spark_connection …

r apache-spark hive sparklyr

asked Nov 08 '17 at 08:07

Chris Njuguna

335
1
3
12

0

votes

1 answer

Does opencpu supports asynchronous call for time consuming R functions?

I have recently created an R package that makes use of sparklyr possibilities. I invoke the package main function from opencpu and pass as argument an url with all my data as a stream. Data stream is successfully analysed in a distributed way via…

r asynchronous sparklyr opencpu

asked Nov 02 '17 at 10:12

efotopoulou

139
3
13

0

votes

1 answer

Can't connect sparklyr to shiny

I'm pretty new to Shiny and Spark. I want to deploy a ShinyApp with a spark connection. Everything works how it should when I just hit RunApp, but whenever I try to publish it, I get the error: "Error in value[3L] : SPARK_HOME directory…

r shiny rstudio dbi sparklyr

asked Oct 31 '17 at 15:07

Alex

1
2

0

votes

1 answer

mclapply and spark_read_parquet

I am relatively new as active user to the forum, but have to thank you all first your contributions because I have been looking for answers since years... Today, I have a question that nobody has solved or I am not able to find... I am trying to…

sparklyr mclapply

asked Oct 27 '17 at 10:51

José Ángel Fernández Segovia

3
1

0

votes

1 answer

How to show memory usage of DataFrames using sparklyr?

Similar to this code snippet that lists the memory usage of objects in the local R environment, is there a similar command to see the memory of DataFrames available in a Spark connection? E.g. Something similar to src_tbls(sc), that currently only…

r apache-spark sparklyr

asked Oct 26 '17 at 02:53

Alex

15,186
15
73
127

0

votes

0 answers

Issues installing and connecting using sparklyr in R

I'm having issues trying to connect using sparklyr. install.packages('sparklyr') require(sparklyr) spark_install() sc <- spark_connect(master = "local") Ive had a few errors I worked through like my dplyr version not being up to date, and something…

r apache-spark sparklyr

asked Oct 23 '17 at 20:18

Matt W.

3,692
2
23
46

0

votes

1 answer

sparklyr spark_apply user defined function error

I'm trying to pass a custom R function inside spark_apply but keep running into issues and cant figure out what some of the errors mean. library(sparklyr) sc <- spark_connect(master = "local") perf_df <- data.frame(predicted = c(5, 7, 20), …

r sparklyr

asked Oct 18 '17 at 20:34

user3527301

23
1
5

0

votes

0 answers

Correlation Matrix Sprark table (spark dataframe) in R

I want to calculate the correlation matrix of a Spark table in R, I tried using cor() has in R, but it does not work, here the code: library(sparklyr) library(dplyr) sc <- spark_connect(master = "local") flights_tbl <- copy_to(sc,…

r apache-spark dplyr apache-spark-sql sparklyr

asked Sep 27 '17 at 14:34

Joe

561
1
9
26

0

votes

1 answer

How do i use the spark-sql "range between" clause for a window operation with sparklyr

Context: I have a large table with logon times. I want to calculate a rolling count of logons within a specified period (e.g. 3600 sec). In SQL/HQL i would specify this as: SELECT id, logon_time, COUNT(*) OVER( PARTITION BY id ORDER BY logon_time…

dplyr sparklyr dbplyr

asked Sep 19 '17 at 11:31

rookie error

165
1
7

0

votes

1 answer

Connecting to Cassandra Data using Sparklyr

I am using RStudio. Installed a local version of Spark, run a few things, quite happy. Now I am trying to read my actual data from a Cluster, using RStudio Server and a standalone version of Spark. Data is in Cassandra, and I do not know how to…

r cassandra spark-cassandra-connector sparklyr rstudio-server

asked Sep 15 '17 at 13:33

user2345448

159
2
11

0

votes

2 answers

Install sparklyr with initialize_connect error

I'm trying to follow the simple guide on SparklyR, but it throws me errors right at the very beginning. I install SparklyR and a local version of Spark as written in the guide: library("sparklyr") spark_install(version="1.6.2") I then open a…

rstudio sparklyr

asked Sep 15 '17 at 07:07

Akira

273
5
15

0

votes

0 answers

sparklyr and local Spark on YARN-Cluster without Cloudera

Have tried this combination without Cloudera but failed. With Cloudera, I tried following the tutorial sparklyr: a test drive on YARN Wonder if anyone has success without need to…

apache-spark hadoop-yarn cloudera sparklyr

asked Sep 12 '17 at 23:36

Charlie

11
2

0

votes

1 answer

Continuous "Got IO error when sending batch UDP bytes: java.net.ConnectException: Connection refused" in RSparkling on CDH-5.10.2

I'm trying to execute this RSparkling example on an offline CDH-5.10.2 cluster. My environment is: Spark 1.6.0; sparklyr 0.6.2; h2o 3.10.5.2; rsparkling 0.2.1. I use custom Sparkling Water JAR which is basically 1.6.12 with this PR…

r cloudera-cdh h2o sparklyr sparkling-water

asked Sep 07 '17 at 17:40

Igor Melnichenko

134
2
13

Questions tagged [sparklyr]