Questions tagged [sparklyr]

sparklyr is an alternative R interface for Apache Spark

sparklyr provides an alternative to interface for built on top of .

External links:

784 questions
0
votes
1 answer

Error "Invalid method csv for object" when using spark_read_csv in sparklyr

I'm trying to read data in R from the hdfs. One thing I'm struggling with when using sparklyr is deciphering the error messages ...because I am not a java programmer. Consider this example: DO THIS IN R create abalone dataframe - abalone is a…
schristel
  • 245
  • 1
  • 13
0
votes
1 answer

Sparklyr: sdf_copy_to fails with 350 MB dataset

I'm facing a problem trying to write 2 dataset using sparklyr::spark_write_csv(). This is my configuration: # Configure cluster config <- spark_config() config$spark.yarn.keytab <- "mykeytab.keytab" config$spark.yarn.principal <-…
0
votes
1 answer

sparklyr for big csv file

I am trying to load a dataset with a million rows and 1000 columns with sparklyr. I am running Spark on a very big cluster at work. Still the size of the data seems to be too big. I have tried two different approaches: This is the dataset:…
Felix
  • 309
  • 2
  • 12
0
votes
1 answer

java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext

library(rsparkling) library(sparklyr) library(h2o) test <- as_h2o_frame(sc, partitions$test, strict_version_check = FALSE) the error is following: Error: java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext at…
Jing Ran
  • 1
  • 1
0
votes
0 answers

how to achieve the same result with sparklyr on a spark dataframe as with dplyr on an R dataframe?

The following code calculates a set of regression coefficients for each of three dependent variables regressed on the set of six independent variable for each of two groups and it works fine. library(tidyverse) library(broom) n <- 20 df4 <-…
0
votes
1 answer

deleted tables when using left_joint with sparklyr

I'm working with some tables that I want to join, for that I use sparklyr (due to tables size) with left_joint of dplyr. here is the code sample : query.1 <- left_join(pa11, pa12, by = c("CODIGO_HAB_D","ID_EST","ID_ME","ID_PARTE_D","ID_PAR",…
nidabdella
  • 811
  • 8
  • 24
0
votes
1 answer

Type mismatch error for filter function with dplyr over a spark data frame

I am currently working on Rstudio over a rhel cluster. I use spark 2.0.2 over a yarn client & have installed the following versions of sparklyr & dplyr sparklyr_0.5.4 ; dplyr_0.5.0 A simple test on the following lines results in error data =…
Param
  • 47
  • 6
0
votes
0 answers

Is there a way to activate local spark cluster that queries database directly?

I am attempting to use the sparklyr package to connect to an existing MS SQL database to query data faster than is possible with the RODBC package. Currently, I am able to successfully query the database using RODBC::odbcConnect() and…
tbradley
  • 2,210
  • 11
  • 20
0
votes
2 answers

I am trying to change all the column names of the data whose class is tbl_spark

Here is the code: library(sparklyr) sc <- spark_connect(master = "local", config = list()) iris_tbl <- copy_to(sc, iris, overwrite = T) newColList <- c("a", "b" , "c" , "d" , " e") colnames(iris_tbl) <- newColList Error:…
Priyanka
  • 11
  • 3
0
votes
0 answers

Sparklyr: How to attach a group by to invoke method?

I have this Spark table: xydata y: num 11.00 22.00 33.00 ... x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... and a handle named xy_df that is connected to this table. I want to invoke the selectExpr function…
Benny Suryajaya
  • 63
  • 1
  • 12
0
votes
0 answers

weighted linear regression with Spark + R

I am using Spark from R, via sparklyr package to run a regression on a huge dataset (>500mill obs). But I wanted a weighted regression and I can't seem to find the correct syntax / function to do that. Currently I am doing…
Hernando Casas
  • 2,837
  • 4
  • 21
  • 30
0
votes
0 answers

Sparklyr: how to apply an operation between a column in Spark table and an R dataframe?

I have this Spark table: xydata y: num 11.00 22.00 33.00 ... x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... And this dataframe in R environment: penalty p: num 1.23 2.34 3.45 ... with the number of rows in…
Benny Suryajaya
  • 63
  • 1
  • 12
0
votes
1 answer

Commands in Sparklyr (R Studio)

What's the difference between the sdf_register and the copy_to command in sparklyr? When do you use each command?
Ark
  • 93
  • 2
  • 6
0
votes
2 answers

read csv function sparklyr error

I'm trying to read a csv file into strudio with sparklyr package in a google compute cluster. This is the configuration: Test Spark framework install.packages("sparklyr") install.packages("dplyr") library(sparklyr) spark_install(version =…
albit paoli
  • 161
  • 2
  • 11
0
votes
1 answer

SparklyR/Spark SQL split string into multiple columns based on number of bites/character count

I have a spark dataframe TABLE1 with one column with 100000 rows each contains a string of the identical length AA105LONDEN 03162017045262017 16953563ABCDEF and I would like to separate each row into multiple columns based on the lines…
Levi Brackman
  • 325
  • 2
  • 17