Highest Voted 'sparklyr' Questions

5

votes

3 answers

R - How to replicate rows in a spark dataframe using sparklyr

Is there a way to replicate the rows of a Spark's dataframe using the functions of sparklyr/dplyr? sc <- spark_connect(master = "spark://####:7077") df_tbl <- copy_to(sc, data.frame(row1 = 1:3, row2 = LETTERS[1:3]), "df") This is the desired…

r apache-spark sparklyr

asked Jun 13 '17 at 20:05

Igor

913
1
8
18

5

votes

1 answer

Sparklyr: Use group_by and then concatenate strings from rows in a group

I am trying to use the group_by() and mutate() functions in sparklyr to concatenate rows in a group. Here is a simple example that I think should work but doesn't: library(sparkylr) d <- data.frame(id=c("1", "1", "2", "2", "1", "2"), …

r data-science sparklyr

asked Jun 06 '17 at 21:15

Maggie

357
4
11

5

votes

0 answers

R - Unable to collect data from Spark using Sparklyr

I'm Using Spark 2.0.2 in combination with sparklyr 0.5.4-9004 on RStudio, in a windows server. Every once in a while, when I try to collect, read or write data from the spark server, I'm getting the following error: Error in UseMethod("invoke") : …

r apache-spark sparklyr

asked Jun 01 '17 at 20:50

Igor

913
1
8
18

5

votes

4 answers

Is it possible to do a full join in dplyr and keep all the columns used in the join?

I have two tables that I want to do a full join using dplyr, but I don't want it to drop any of the columns. Per the documentation and my own experience it is only keeping the join column for the left hand side. This is a problem when you have a row…

r dplyr sparklyr

asked May 05 '17 at 15:52

Dave Kincaid

3,970
3
24
32

5

votes

2 answers

Disable hive support in sparklyr

Is there any way to disable the hive support in sparklyr? Just like in SparkR: sparkR.session(master="local[*]", enableHiveSupport=FALSE)

r sparklyr

asked Jan 09 '17 at 16:44

Raphael Sampaio

148
2
11

5

votes

1 answer

Running out of heap space in sparklyr, but have plenty of memory

I am getting heap space errors on even fairly small datasets. I can be sure that I'm not running out of system memory. For example, consider a dataset containing about 20M rows and 9 columns, and that takes up 1GB on disk. I am playing with it on a…

r apache-spark dplyr sparklyr

asked Dec 29 '16 at 17:18

David Bruce Borenstein

1,655
2
19
34

5

votes

2 answers

Trying to Connect R to Spark using Sparklyr

I'm trying to connect R to Spark using Sparklyr. I followed the tutorial from rstudio blog I tried installing sparklyr using install.packages("sparklyr") which went fine but In another post, I saw that there was a bug in sparklyr_0.4 version. So I…

r apache-spark sparklyr

asked Oct 17 '16 at 01:46

Rakesh Kumar

161
2
9

5

votes

1 answer

How may I connect Google Dataproc cluster from Sparklyr?

I'm new to Spark and GCP. I've tried to connect to it with sc <- spark_connect(master = "IP address") but it obviously couldn't work (e.g. there is no authentication). How should I do that? Is it possible to connect to it from outside Google Cloud?

google-cloud-platform google-cloud-dataproc sparklyr

asked Sep 28 '16 at 20:56

Krzysztof Jędrzejewski

698
4
21

5

votes

4 answers

Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?

Is the sparklyr R package able to connect to YARN-managed hadoop clusters? This doesn't seem to be documented in the cluster deployment documentation. Using the SparkR package that ships with Spark it is possible by doing: # set R environment…

r apache-spark hadoop-yarn sparkapi sparklyr

asked Jun 29 '16 at 14:42

Matt Pollock

1,063
10
26

4

votes

1 answer

Spark regexp_extract() fails - Regex group count is 0, but the specified group index is 1

I would like to extract the last part of the string (after the last forward slash). When I use the following code it fails with the error: library(sparklyr) library(tidyverse) sc <- spark_connect(method = "databricks") tibble(my_string =…

r regex apache-spark illegalargumentexception sparklyr

asked Dec 01 '21 at 10:18

Piotr K

943
9
20

4

votes

1 answer

How to read all files in S3 folder/bucket using sparklyr in R?

I have tried below code & its combinations in order to read all files given in a S3 folder , but nothing seems to be working .. Sensitive information/code is removed from the below script. There are 6 files each with 6.5 GB . #Spark…

r apache-spark amazon-s3 rstudio sparklyr

asked Dec 03 '18 at 06:42

Yogesh Kumar

609
6
22

4

votes

1 answer

How to explode the dataset in JSON file by using explode functionality in R?

Note - I have referred answer, but although the data is un-nested but I could not convert data into csv file format. I want to flatten the data of different data types by using explode functionality. The dataset contains arrays and structure. I want…

r apache-spark nested explode sparklyr

asked Oct 06 '18 at 19:27

Shree

203
3
22

4

votes

2 answers

sparklyr can I pass format and path options into spark_write_table? or use saveAsTable with spark_write_orc?

Spark 2.0 with Hive Let's say I am trying to write a spark dataframe, irisDf to orc and save it to the hive metastore In Spark I would do that like this, irisDf.write.format("orc") .mode("overwrite") .option("path", "s3://my_bucket/iris/") …

r apache-spark hive apache-spark-sql sparklyr

asked Aug 16 '18 at 22:42

blakiseskream

338
4
9

4

votes

1 answer

How to row bind two Spark dataframes using sparklyr?

I tried the following to row bind two Spark dataframes but I gave an error message library(sparklyr) library(dplyr) sc <- spark_connect(master = "local") iris_tbl <- copy_to(sc, iris) iris_tbl1 <- copy_to(sc, iris, "iris1") iris_tbl2 =…

r apache-spark dplyr sparklyr

asked Aug 15 '18 at 23:56

xiaodai

14,889
18
76
140

4

votes

1 answer

Use of first, last, nth in sparklyr

I have looked all over and I'm still unable to get those three dplyr functions to work within sparklyr. I have a reproducible example below. First, some session info: R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under:…

r dplyr rstudio sparklyr

asked Jul 23 '18 at 20:28

Hutch3232

408
4
11

Questions tagged [sparklyr]