Questions tagged [sparklyr]

sparklyr is an alternative R interface for Apache Spark

sparklyr provides an alternative to interface for built on top of .

External links:

784 questions
6
votes
1 answer

How to store data in a Spark cluster using sparklyr?

If I connect to a Spark cluster, copy some data to it, and disconnect, ... library(dplyr) library(sparklyr) sc <- spark_connect("local") copy_to(sc, iris) src_tbls(sc) ## [1] "iris" spark_disconnect(sc) then the next time I connect to Spark, the…
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
6
votes
1 answer

How to implement Stanford CoreNLP wrapper for Apache Spark using sparklyr?

I am trying to create a R package so I can use the Stanford CoreNLP wrapper for Apache Spark (by databricks) from R. I am using the sparklyr package to connect to my local Spark instance. I created a package with the following dependency function…
5
votes
1 answer

What is the point of using compute() in sparklyr?

In the sparklyr tutorial I'm following it says I can use compute() to store the results of the preceding dplyr statement into a new spark data frame. The code in 'code 1' creates a new spark data frame called "NewSparkDataframe" and a spark_tbl is…
Steve
  • 625
  • 2
  • 5
  • 17
5
votes
1 answer

could not find function "switch_lang"

getting this error, does anyone have an idea what is triggering it? #### sc is a spark…
Mouad_Seridi
  • 2,666
  • 15
  • 27
5
votes
2 answers

Is there a way to fill in missing dates with 0s using dplyr?

I have a dataset like this: id date value 1 8/06/12 1 1 8/08/12 1 2 8/07/12 2 2 8/08/12 1 Every id should a have a value for every date. When an id is missing a particular date,…
Jacob Curtis
  • 788
  • 1
  • 8
  • 22
5
votes
1 answer

Find variables making Primary Key using SparkR

Here is my toy data: df <- tibble::tribble( ~var1, ~var2, ~var3, ~var4, ~var5, ~var6, ~var7, "A", "C", 1L, 5L, "AA", "AB", 1L, "A", "C", 2L, 5L, "BB", "AC", 2L, "A", "D", 1L, 7L, "AA", "BC", 2L, …
Geet
  • 2,515
  • 2
  • 19
  • 42
5
votes
2 answers

How can I train a random forest with a sparse matrix in Spark?

Consider this simple example that uses sparklyr: library(sparklyr) library(janeaustenr) # to get some text data library(stringr) library(dplyr) mytext <- austen_books() %>% mutate(label = as.integer(str_detect(text, 'great'))) #create a fake…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
5
votes
1 answer

Importing multiple files in sparklyr

I'm very new to sparklyr and spark, so please let me know if this is not the "spark" way to do this. My problem I have 50+ .txt files at around 300 mb each, all in the same folder, call it x, that I need to import to sparklyr, preferably one…
shitoushan
  • 458
  • 3
  • 11
5
votes
3 answers

how to convert a timestamp into string (without changing timezone)?

I have some unix times that I convert to timestamps in sparklyr and for some reasons I also need to convert them into strings. Unfortunately, it seems that during the conversion to string hive converts to EST (my locale). df_new <-…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
5
votes
1 answer

calculate quantile by group in Sparklyr

I have a dataframe in Spark, and would like to calculate the 0.1 quantile after grouping by a specific column. For example: > library(sparklyr) > library(tidyverse) > con = spark_connect(....) > diamonds_sdl = copy_to(con, diamonds) > diamonds #…
dalloliogm
  • 8,718
  • 6
  • 45
  • 55
5
votes
2 answers

How to get the significance of coeficients in logistic regression using `ml_logistic_regression`

I want to know the significance of each coefficient of a logistic regression model using spark function ml_logistic_regression. The code is as follows: # data in R library(MASS) data(birthwt) str(birthwt) detach("package:MASS", unload=TRUE) #…
Joe
  • 561
  • 1
  • 9
  • 26
5
votes
1 answer

Sparklyr using case_when with variables

Sparklyr fails when using a case_when with external variables. Working Example: test <- copy_to(sc, tibble(column = c(1,2,3,4))) test %>% mutate(group = case_when( column %in% c(1,2) ~ 'group 1', column %in%…
rookie error
  • 165
  • 1
  • 7
5
votes
1 answer

Sparklyr - Decimal precision 8 exceeds max precision 7

I'm trying to copy a big database into Spark using spark_read_csv, but I'm getting the following error as output: Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 16.0 failed 4 times, most recent failure: …
Igor
  • 913
  • 1
  • 8
  • 18
5
votes
1 answer

Sparklyr - Unable to copy data.frames into Spark using copy_to

I'm trying to copy a big dataframe (around 5.8 million records) into Spark using Sparklyr's function copy_to. First, when loading the data using fread (data.table), and applying the copy_to function, I got the following output error: Error in…
Igor
  • 913
  • 1
  • 8
  • 18
5
votes
1 answer

sparklyr livy connection with Kerberos

I'm able to connect to non-Kerberized spark cluster through Livy service without problems from a remote Rstudio desktop (windows). However, if the Kerberos security is enabled, the connection fails: library(sparklyr) sc <-…
runr
  • 1,142
  • 1
  • 9
  • 25
1 2
3
52 53