Highest Voted 'sparklyr' Questions

4

votes

1 answer

Is there a way to set a name to a csv file in sparklyr using spark_write_csv?

I need to write a data frame to a single csv file, and found out that I can use sdf_coalesce() to turn the file into a single partition. I want to find out if there's any way I can change the name of the csv file generated by…

r sparklyr

asked Apr 04 '18 at 13:38

Daniel Limaviegas

105
4

4

votes

1 answer

colnames in `sparklyr::spark_apply()` using `dplyr::mutate()`

Assuming sc is an existing spark(lyr) connection, the names given in dplyr::mutate() are ignored: iris_tbl <- sdf_copy_to(sc, iris) iris_tbl %>% spark_apply(function(e){ library(dplyr) e %>% mutate(slm = median(Sepal_Length)) }) ##…

r dplyr sparklyr

asked Mar 13 '18 at 14:46

nachti

1,086
7
20

4

votes

1 answer

How to calculate distance between strings using sparklyr?

I need to calculate the distance between two strings in R using sparklyr. Is there a way of using stringdist or any other package? I wanted to use cousine distance. This distance is used as a method of stringdist function. Thanks in advance.

r sparklyr stringdist

asked Mar 02 '18 at 20:49

Daniel Limaviegas

105
4

4

votes

1 answer

Unnest (seperate) multiple column values into new rows using Sparklyr

I am trying to split column values separated by comma(,) into new rows based on id's. I know how to do this in R using dplyr and tidyr. But I am looking to solve same problem in sparklyr. id <- c(1,1,1,1,1,2,2,2,3,3,3) name <-…

r apache-spark dplyr sparkr sparklyr

asked Feb 08 '18 at 16:06

Rushabh Patel

2,672
13
34

4

votes

1 answer

Convert a string to logical in R with sparklyr

I have 100 million rows stored in many .csv files in a distributed file system. I'm using spark_read_csv() to load the data without issue. Many of my columns are stored as character logical values: "true", "false", "". I do not have control…

r apache-spark sparklyr

asked Nov 28 '17 at 19:42

kputschko

766
1
7
21

4

votes

1 answer

how to find colums having missing data in sparklyr

example sample data Si K Ca Ba Fe Type 71.78 0.06 8.75 0 0 1 72.73 0.48 7.83 0 0 1 72.99 0.39 7.78 0 0 1 72.61 0.57 na 0 0 na 73.08 0.55 8.07 0 0 1 72.97 0.64 8.07 0…

r apache-spark sparklyr

asked Nov 22 '17 at 10:53

vijaynadal

55
5

4

votes

2 answers

Reading files from multiple sub folders in sparklyr

In Spark 2.0 I can combine several file paths into a single load (see e. g. How to import multiple csv files in a single load?). How can I achieve this with sparklyr's spark-read-csv?

r sparklyr

asked Oct 27 '17 at 00:00

Deepdelusion

121
1
6

4

votes

1 answer

How limit number of lines read from a parquet file in sparklyr

I have a huge parquet file that dont fits in memory nor in disk when read, theres a way to use spark_read_parquet to only read the first n lines?

parquet sparklyr

asked Oct 17 '17 at 19:48

Jader Martins

759
6
26

4

votes

1 answer

Converting string/chr to date using sparklyr

I've brought a table into Hue which has a column of dates and i'm trying to play with it using sparklyr in Rstudio. I'd like to convert a character column into a date column like so: Weather_data = mutate(Weather_data, date2 = as.Date(date,…

r apache-spark hive dplyr sparklyr

asked Sep 27 '17 at 17:13

Keith

103
1
9

4

votes

2 answers

Connecting to Spark with Sparklyr gives Permission Denied Error

After installing sparklyr package I followed the instruction here ( http://spark.rstudio.com/ ) to connect to spark. But faced with this error. Am I doing something wrong. Please help me. sc = spark_connect( master = 'local' ) Error in file(con,…

r apache-spark sparklyr

asked Jun 25 '17 at 18:33

boral

131
9

4

votes

1 answer

What is the most efficient way to create new Spark Tables or Data Frames in Sparklyr?

Using the sparklyr package on a Hadoop cluster (not a VM), I'm working with several types of tables that need to be joined, filtered, etc... and I'm trying to determine what would be the most efficient way to use the dplyr commands along with the…

hadoop apache-spark hive dplyr sparklyr

asked Jun 23 '17 at 16:59

quickreaction

675
5
17

4

votes

2 answers

Sparklyr/Hive: how to use regex (regexp_replace) correctly?

Consider the following example dataframe_test<- data_frame(mydate = c('2011-03-01T00:00:04.226Z', '2011-03-01T00:00:04.226Z')) # A tibble: 2 x 1 mydate 1 2011-03-01T00:00:04.226Z 2…

r apache-spark hive sparklyr

asked Jun 20 '17 at 16:48

ℕʘʘḆḽḘ

18,566
34
128
235

4

votes

1 answer

sparklyr can't see databases created in Hive and vice versa

I installed Apache Hive in local and I was trying to read tables via Rstudio/sparklyr. I created a database using Hive: hive> CREATE DATABASE test; and I was trying to read that database using the following R…

r hadoop hive sparklyr

asked May 31 '17 at 08:44

stochazesthai

617
1
7
20

4

votes

1 answer

What is the options parameter of spark_write_csv dplyr function?

I was looking for a way to make spark_write_csv to upload only a single file to S3 because I want to save the regression result on S3. I was wondering if options has some parameter which defines number of partitions. I could not find it anywhere in…

r apache-spark amazon-s3 dplyr sparklyr

asked May 19 '17 at 11:09

chandni ramdasan

41
1

4

votes

2 answers

Specifying col type in Sparklyr (spark_read_csv)

I am reading in a csv into spark using SpraklyR schema <- structType(structField("TransTime", "array", TRUE), structField("TransDay", "Date", TRUE)) spark_read_csv(sc, filename, "path", infer_schema = FALSE, schema =…

r sparklyr

asked Mar 24 '17 at 15:17

Levi Brackman

325
2
17

Questions tagged [sparklyr]