Highest Voted 'sparklyr' Questions

0

votes

1 answer

Error "Invalid method csv for object" when using spark_read_csv in sparklyr

I'm trying to read data in R from the hdfs. One thing I'm struggling with when using sparklyr is deciphering the error messages ...because I am not a java programmer. Consider this example: DO THIS IN R create abalone dataframe - abalone is a…

r hadoop apache-spark sparkr sparklyr

asked May 31 '17 at 13:12

schristel

245
1
13

0

votes

1 answer

Sparklyr: sdf_copy_to fails with 350 MB dataset

I'm facing a problem trying to write 2 dataset using sparklyr::spark_write_csv(). This is my configuration: # Configure cluster config <- spark_config() config$spark.yarn.keytab <- "mykeytab.keytab" config$spark.yarn.principal <-…

r hadoop configuration sparklyr

asked May 31 '17 at 07:14

Pozz_over13

21
3

0

votes

1 answer

sparklyr for big csv file

I am trying to load a dataset with a million rows and 1000 columns with sparklyr. I am running Spark on a very big cluster at work. Still the size of the data seems to be too big. I have tried two different approaches: This is the dataset:…

r apache-spark sparklyr

asked May 30 '17 at 10:31

Felix

309
2
12

0

votes

1 answer

java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext

library(rsparkling) library(sparklyr) library(h2o) test <- as_h2o_frame(sc, partitions$test, strict_version_check = FALSE) the error is following: Error: java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext at…

r apache-spark h2o sparklyr

asked May 29 '17 at 11:14

Jing Ran

1
1

0

votes

0 answers

how to achieve the same result with sparklyr on a spark dataframe as with dplyr on an R dataframe?

The following code calculates a set of regression coefficients for each of three dependent variables regressed on the set of six independent variable for each of two groups and it works fine. library(tidyverse) library(broom) n <- 20 df4 <-…

r apache-spark dplyr sparklyr broom

asked May 24 '17 at 15:40

user1689945

3
1

0

votes

1 answer

deleted tables when using left_joint with sparklyr

I'm working with some tables that I want to join, for that I use sparklyr (due to tables size) with left_joint of dplyr. here is the code sample : query.1 <- left_join(pa11, pa12, by = c("CODIGO_HAB_D","ID_EST","ID_ME","ID_PARTE_D","ID_PAR",…

r dplyr rstudio sparklyr

asked May 10 '17 at 14:33

nidabdella

811
8
24

0

votes

1 answer

Type mismatch error for filter function with dplyr over a spark data frame

I am currently working on Rstudio over a rhel cluster. I use spark 2.0.2 over a yarn client & have installed the following versions of sparklyr & dplyr sparklyr_0.5.4 ; dplyr_0.5.0 A simple test on the following lines results in error data =…

apache-spark dplyr sparklyr

asked May 02 '17 at 13:32

Param

47
6

0

votes

0 answers

Is there a way to activate local spark cluster that queries database directly?

I am attempting to use the sparklyr package to connect to an existing MS SQL database to query data faster than is possible with the RODBC package. Currently, I am able to successfully query the database using RODBC::odbcConnect() and…

sql-server r apache-spark apache-spark-sql sparklyr

asked May 01 '17 at 15:31

tbradley

2,210
11
20

0

votes

2 answers

I am trying to change all the column names of the data whose class is tbl_spark

Here is the code: library(sparklyr) sc <- spark_connect(master = "local", config = list()) iris_tbl <- copy_to(sc, iris, overwrite = T) newColList <- c("a", "b" , "c" , "d" , " e") colnames(iris_tbl) <- newColList Error:…

r sparklyr

asked May 01 '17 at 10:53

Priyanka

11
3

0

votes

0 answers

Sparklyr: How to attach a group by to invoke method?

I have this Spark table: xydata y: num 11.00 22.00 33.00 ... x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... and a handle named xy_df that is connected to this table. I want to invoke the selectExpr function…

r apache-spark dplyr sparkr sparklyr

asked Apr 28 '17 at 04:39

Benny Suryajaya

63
1
12

0

votes

0 answers

weighted linear regression with Spark + R

I am using Spark from R, via sparklyr package to run a regression on a huge dataset (>500mill obs). But I wanted a weighted regression and I can't seem to find the correct syntax / function to do that. Currently I am doing…

r apache-spark apache-spark-mllib sparkr sparklyr

asked Apr 27 '17 at 23:47

Hernando Casas

2,837
4
21
30

0

votes

0 answers

Sparklyr: how to apply an operation between a column in Spark table and an R dataframe?

I have this Spark table: xydata y: num 11.00 22.00 33.00 ... x0: num 1.00 2.00 3.00 ... x1: num 2.00 3.00 4.00 ... ... x788: num 2.00 3.00 4.00 ... And this dataframe in R environment: penalty p: num 1.23 2.34 3.45 ... with the number of rows in…

r apache-spark dplyr sparkr sparklyr

asked Apr 27 '17 at 09:41

Benny Suryajaya

63
1
12

0

votes

1 answer

Commands in Sparklyr (R Studio)

What's the difference between the sdf_register and the copy_to command in sparklyr? When do you use each command?

r apache-spark sparklyr

asked Apr 09 '17 at 04:02

Ark

93
2
6

0

votes

2 answers

read csv function sparklyr error

I'm trying to read a csv file into strudio with sparklyr package in a google compute cluster. This is the configuration: Test Spark framework install.packages("sparklyr") install.packages("dplyr") library(sparklyr) spark_install(version =…

r sparklyr

asked Apr 05 '17 at 12:40

albit paoli

161
2
11

0

votes

1 answer

SparklyR/Spark SQL split string into multiple columns based on number of bites/character count

I have a spark dataframe TABLE1 with one column with 100000 rows each contains a string of the identical length AA105LONDEN 03162017045262017 16953563ABCDEF and I would like to separate each row into multiple columns based on the lines…

r apache-spark sparklyr r-dbi

asked Mar 08 '17 at 03:25

Levi Brackman

325
2
17

Questions tagged [sparklyr]