Highest Voted 'sparklyr' Questions

8

votes

5 answers

spark: java.io.IOException: No space left on device [again!]

I am getting the java.io.IOException: No space left on device that occurs after running a simple query in sparklyr. I use both last versions of Spark (2.1.1) and Sparklyr df_new <-spark_read_parquet(sc, "/mypath/parquet_*", name = "df_new", memory =…

r apache-spark pyspark sparklyr

asked Jul 03 '17 at 14:32

ℕʘʘḆḽḘ

18,566
34
128
235

8

votes

3 answers

Connect sparklyr to remote spark connection

I would like to connect my local desktop RStudio session to a remote spark session via sparklyr. When you go to add a new connection in the sparklyr ui tab in RStudio and choose cluster is says that you have to be running on the cluster, or have a…

r apache-spark sparklyr

asked Sep 30 '16 at 19:28

Jim Crozier

1,378
2
16
28

7

votes

2 answers

How to set SPARK_LOCAL_DIRS parameter using spark-env.sh file

I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS parameter in the spark-env.sh file, but I am not having any luck with the changes actually taking…

apache-spark sparklyr

asked Aug 29 '18 at 02:41

jay

517
1
7
19

7

votes

1 answer

How to train a ML model in sparklyr and predict new values on another dataframe?

Consider the following example dtrain <- data_frame(text = c("Chinese Beijing Chinese", "Chinese Chinese Shanghai", "Chinese Macao", "Tokyo Japan Chinese"), …

r apache-spark apache-spark-ml sparklyr

asked May 25 '18 at 17:26

ℕʘʘḆḽḘ

18,566
34
128
235

7

votes

1 answer

"GC overhead limit exceeded" on cache of large dataset into spark memory (via sparklyr & RStudio)

I am very new to the Big Data technologies I am attempting to work with, but have so far managed to set up sparklyr in RStudio to connect to a standalone Spark cluster. Data is stored in Cassandra, and I can successfully bring large datsets into…

r apache-spark cassandra sparklyr

asked Mar 06 '17 at 12:12

renegademonkey

457
1
7
18

7

votes

0 answers

Sparklyr "embedded nul in string" when collecting

In R I have a spark connection and a DataFrame as ddf. library(sparklyr) library(tidyverse) sc <- spark_connect(master = "foo", version = "2.0.2") ddf <- spark_read_parquet(sc, name='test', path="hdfs://localhost:9001/foo_parquet") Since it's not a…

r apache-spark dplyr sparklyr

asked Feb 20 '17 at 09:38

Tim

2,000
4
27
45

7

votes

1 answer

Changing column data type to factor with sparklyr

I am pretty new to Spark and am currently using it using the R API through sparkly package. I created a Spark data frame from hive query. The data types are not specified correctly in the source table and I'm trying to reset the data type by…

r apache-spark dplyr apache-spark-sql sparklyr

asked Dec 21 '16 at 02:22

b396958

73
1
4

6

votes

1 answer

what is the difference between dplyr::copy_to and sparklyr::sdf_copy_to?

I am using the library sparklyr to interact with 'spark'. There are two functions for put a data frame in a spark context. Such functions are 'dplyr::copy_to' and 'sparklyr::sdf_copy_to'. What is the difference and when is recommended to use one…

r dplyr sparklyr

asked May 15 '19 at 11:57

Sergio Marrero Marrero

178
2
16

6

votes

1 answer

Writing a function to use with spark_apply() from sparklyr

test <- data.frame('prod_id'= c("shoe", "shoe", "shoe", "shoe", "shoe", "shoe", "boat", "boat","boat","boat","boat","boat"), 'seller_id'= c("a", "b", "c", "d", "e", "f", "a","g", "h", "r", "q", "b"), 'Dich'= c(1, 0,…

r dplyr sparklyr

asked Dec 04 '18 at 06:09

Kreitz Gigs

369
1
9

6

votes

1 answer

Extract and Visualize Model Trees from Sparklyr

Does anyone have any advice about how to convert the tree information from sparklyr's ml_decision_tree_classifier, ml_gbt_classifier, or ml_random_forest_classifier models into a.) a format that can be understood by other R tree-related libraries…

r apache-spark random-forest decision-tree sparklyr

asked Nov 02 '18 at 18:14

RealViaCauchy

237
1
10

6

votes

3 answers

Find out if 2 tables (`tbl_spark`) are equal without collecting them using sparklyr

Consider there are 2 tables or table references in spark which you want to compare, e.g. to ensure that your backup worked correctly. Is there a possibility to do that remote in spark? Because it's not useful to copy all the data to R using…

r apache-spark dataframe dplyr sparklyr

asked Jul 26 '18 at 08:51

nachti

1,086
7
20

6

votes

1 answer

Sparklyr ignoring line delimiter

I'm trying to read a .csv of 2GB~ (5mi lines) in sparklyr with: bigcsvspark <- spark_read_csv(sc, "bigtxt", "path", delimiter = "!", infer_schema = FALSE, …

r csv sparklyr

asked Oct 13 '17 at 19:01

Jader Martins

759
6
26

6

votes

1 answer

How to use a predicate while reading from JDBC connection?

By default, spark_read_jdbc() reads an entire database table into Spark. I've used the following syntax to create these connections. library(sparklyr) library(dplyr) config <- spark_config() config$`sparklyr.shell.driver-class-path` <-…

r apache-spark jdbc sparklyr

asked Jul 31 '17 at 16:26

Jake Russ

683
1
9
19

6

votes

3 answers

sparklyr write data to hdfs or hive

I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are running on two different servers , thus I need a way…

sparklyr

asked Jun 27 '17 at 21:58

Rahul

71
1
4

6

votes

3 answers

Access table in other than default scheme (database) from sparklyr

After I managed it to connect to our (new) cluster using sparklyr with yarn-client method, now I can show just the tables from the default scheme. How can I connect to scheme.table? Using DBI it's working e.g. with the following line: dbGetQuery(sc,…

r apache-spark dplyr sparklyr

asked May 05 '17 at 13:35

nachti

1,086
7
20

Questions tagged [sparklyr]