Questions tagged [sparklyr]

sparklyr is an alternative R interface for Apache Spark

sparklyr provides an alternative to interface for built on top of .

External links:

784 questions
0
votes
1 answer

Overwrite a Spark DataFrame into location

I want to save my Spark DataFrame into directory using spark_write_* function like this: spark_write_csv(df, "file:///home/me/dir/") but if the directory is already there I will get error: ERROR: org.apache.spark.sql.AnalysisException: path…
bartektartanus
  • 15,284
  • 6
  • 74
  • 102
0
votes
0 answers

Sparklyr ml_linear_regression Error: Invalid method setForceIndexLabel

I am trying to perform linear regression using SparklyR on an EMR cluster, and receiving the error below. The connection to Spark seems fine, and I have tried using several different datasets, but they all result in the same error. I am looking…
0
votes
1 answer

r sparklyr spark_apply Error: org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous

I am trying to use spark_apply on a spark cluster to calculate kmeans on data grouped by two columns. The data is queried from Hive and looks like this > samplog1 # Source: lazy query [?? x 6] # Database: spark_connection …
Chris Njuguna
  • 335
  • 1
  • 3
  • 12
0
votes
1 answer

Does opencpu supports asynchronous call for time consuming R functions?

I have recently created an R package that makes use of sparklyr possibilities. I invoke the package main function from opencpu and pass as argument an url with all my data as a stream. Data stream is successfully analysed in a distributed way via…
efotopoulou
  • 139
  • 3
  • 13
0
votes
1 answer

Can't connect sparklyr to shiny

I'm pretty new to Shiny and Spark. I want to deploy a ShinyApp with a spark connection. Everything works how it should when I just hit RunApp, but whenever I try to publish it, I get the error: "Error in value[3L] : SPARK_HOME directory…
Alex
  • 1
  • 2
0
votes
1 answer

mclapply and spark_read_parquet

I am relatively new as active user to the forum, but have to thank you all first your contributions because I have been looking for answers since years... Today, I have a question that nobody has solved or I am not able to find... I am trying to…
0
votes
1 answer

How to show memory usage of DataFrames using sparklyr?

Similar to this code snippet that lists the memory usage of objects in the local R environment, is there a similar command to see the memory of DataFrames available in a Spark connection? E.g. Something similar to src_tbls(sc), that currently only…
Alex
  • 15,186
  • 15
  • 73
  • 127
0
votes
0 answers

Issues installing and connecting using sparklyr in R

I'm having issues trying to connect using sparklyr. install.packages('sparklyr') require(sparklyr) spark_install() sc <- spark_connect(master = "local") Ive had a few errors I worked through like my dplyr version not being up to date, and something…
Matt W.
  • 3,692
  • 2
  • 23
  • 46
0
votes
1 answer

sparklyr spark_apply user defined function error

I'm trying to pass a custom R function inside spark_apply but keep running into issues and cant figure out what some of the errors mean. library(sparklyr) sc <- spark_connect(master = "local") perf_df <- data.frame(predicted = c(5, 7, 20), …
user3527301
  • 23
  • 1
  • 5
0
votes
0 answers

Correlation Matrix Sprark table (spark dataframe) in R

I want to calculate the correlation matrix of a Spark table in R, I tried using cor() has in R, but it does not work, here the code: library(sparklyr) library(dplyr) sc <- spark_connect(master = "local") flights_tbl <- copy_to(sc,…
Joe
  • 561
  • 1
  • 9
  • 26
0
votes
1 answer

How do i use the spark-sql "range between" clause for a window operation with sparklyr

Context: I have a large table with logon times. I want to calculate a rolling count of logons within a specified period (e.g. 3600 sec). In SQL/HQL i would specify this as: SELECT id, logon_time, COUNT(*) OVER( PARTITION BY id ORDER BY logon_time…
rookie error
  • 165
  • 1
  • 7
0
votes
1 answer

Connecting to Cassandra Data using Sparklyr

I am using RStudio. Installed a local version of Spark, run a few things, quite happy. Now I am trying to read my actual data from a Cluster, using RStudio Server and a standalone version of Spark. Data is in Cassandra, and I do not know how to…
0
votes
2 answers

Install sparklyr with initialize_connect error

I'm trying to follow the simple guide on SparklyR, but it throws me errors right at the very beginning. I install SparklyR and a local version of Spark as written in the guide: library("sparklyr") spark_install(version="1.6.2") I then open a…
Akira
  • 273
  • 5
  • 15
0
votes
0 answers

sparklyr and local Spark on YARN-Cluster without Cloudera

Have tried this combination without Cloudera but failed. With Cloudera, I tried following the tutorial sparklyr: a test drive on YARN Wonder if anyone has success without need to…
Charlie
  • 11
  • 2
0
votes
1 answer

Continuous "Got IO error when sending batch UDP bytes: java.net.ConnectException: Connection refused" in RSparkling on CDH-5.10.2

I'm trying to execute this RSparkling example on an offline CDH-5.10.2 cluster. My environment is: Spark 1.6.0; sparklyr 0.6.2; h2o 3.10.5.2; rsparkling 0.2.1. I use custom Sparkling Water JAR which is basically 1.6.12 with this PR…
Igor Melnichenko
  • 134
  • 2
  • 13