Highest Voted 'sparklyr' Questions

0

votes

2 answers

count number of unique elements in each columns with dplyr in sparklyr

I'm trying to count the number of unique elements in each column in the spark dataset s. However It seems that spark doesn't recognize tally() k<-collect(s%>%group_by(grouping_type)%>%summarise_each(funs(tally(distinct(.))))) Error:…

r apache-spark statistics dplyr sparklyr

asked Apr 19 '18 at 20:47

StatsBoy

35
5

0

votes

0 answers

How to find pairs of data by timestamp-window & values from different rows in sparklyr?

My test-data looks like this: (it's graph-like) elemuid <- c(1, 2, 3, 4, 5, 6, 7) timestamp <- c("2018-02-10 23:00:00", "2018-02-10 23:01:00", "2018-02-10 22:59:00", "2018-02-10 22:40:00", "2018-02-10 22:39:00", "2018-02-10 22:37:00", "2018-02-10…

r apache-spark graph sparklyr

asked Apr 19 '18 at 06:04

user60856839

133
11

0

votes

1 answer

Sparklyr Spark 2.1 generate top n recommendation

R version 3.3.0 (2016-05-03) Sparklyr version ‘0.7.0’ Spark version 2.1 on YARN client I am using Spark framework in R using Sparklyr for generating top-5 recommendations for products which are likely to be sold and their expected quantity using ALS…

r apache-spark-mllib recommendation-engine sparklyr top-n

asked Apr 10 '18 at 11:59

user2537864

13
4

0

votes

2 answers

sparklyr spark_read_parquet Reading String Fields as Lists

I have a number of Hive files in parquet format that contain both string and double columns. I can read most of them into a Spark Data Frame with sparklyr using the syntax below: spark_read_parquet(sc, name = "name", path = "path", memory =…

r hive apache-spark-sql parquet sparklyr

asked Mar 09 '18 at 19:29

bshelt141

1,183
15
31

0

votes

0 answers

Wrong data when reading with sparklyr

I am using R and sparklyr process some data from Spark. I am reading two parquet files, in sequence, with v1 <- spark_read_parquet(sc, "events","s3n://project/sessions.parquet", memory="true") head(v1) v2 <- spark_read_parquet(sc,…

r apache-spark sparklyr

asked Mar 01 '18 at 16:12

user2345448

159
2
11

0

votes

0 answers

I want to process tens of thousands of columns using Spark via sparklyr, but I can't

I tried using sdf_pivot() to widen my column with duplicate values into multiple (a very big number) columns. I planned to use these columns as the feature space for training an ML model. Example: I have a language element sequence in one column…

r apache-spark-sql sentiment-analysis sparklyr

asked Feb 21 '18 at 09:32

Alexey Burnakov

259
2
14

0

votes

2 answers

How do you access the model parameters in ml_decision_tree in the Sparklyr package?

I have some sample code that is only working on one machine. After some testing, I discovered that the machine that worked was running R 3.4.2 while everything else was running 3.4.3. After some work I discovered that the way you access the…

r apache-spark sparklyr

asked Feb 16 '18 at 21:11

Bob Wakefield

3,739
4
20
30

0

votes

1 answer

Convert variable as Timestamp in sparklyr

I know similar question has been asked multiple times before but I have tried all those options and still not get desired result. I have a sdf as kl in following format: CONSUMER_ID TimeStamp TimeStamp2 …

r apache-spark sparklyr

asked Feb 09 '18 at 07:49

ROY

268
2
11

0

votes

0 answers

Calling any Spark MLlib function from R?

I found this example of calling spark.mllib functions directly from Scala library. I don't get all things here, but anyway is it possible to call any MLlib function (which is not present via, let's say, spaklyr) this way? In particular I am…

r scala apache-spark sparklyr

asked Feb 07 '18 at 17:28

Alexey Burnakov

259
2
14

0

votes

1 answer

Error after trying to make a date column from a character column

Using library sparklyr, I try to create a date variable in the Spark dataframe this way (which works in R): # Researching SPARK…

r date dplyr sparklyr

asked Feb 01 '18 at 17:59

Alexey Burnakov

259
2
14

0

votes

1 answer

Connecting Spark with R studio on Mac OS gives Hive error

I am trying to use Spark in R Studio using the sparklyr library on MacOS. I have installed it using the following commands # Install the sparklyr package install.packages("sparklyr") # Now load the library library(sparklyr) # Install Spark to your…

r sparklyr

asked Jan 22 '18 at 17:17

Regressor

1,843
4
27
67

0

votes

2 answers

How to implement lapply function in R using package "sparklyr"

I am pretty new to Spark, I have tried to look for something on the web but I haven't found anything satisfactory. I have always run parallel computations using the command mclapply and I like its structure (i.e., first parameter used as scrolling…

r apache-spark parallel-processing sparklyr mclapply

asked Jan 15 '18 at 14:28

ciccioz

75
6

0

votes

1 answer

How to select the same column of a Spark data frame multiple times in Sparklyr?

I have a Spark data frame sdf. I would like to generate another table with columns of sdf, however those columns can repeat themselves. The following is the desired output. > sdf %>% select(DC1_Y1,DC2_Y1,DC2_Y1) # Source: lazy query [?? x 3] #…

r select dplyr sparklyr

asked Jan 15 '18 at 07:29

axiom

406
1
4
16

0

votes

0 answers

Using ml_save with R/Spark

I am training some models (random forest) using ml library in Spark, R, and sparklyr. Everything ok, but now I need to save those models, so they can be used to make predictions for new data. I call ml_save(rfW1,w$fileName) where rfW1 is the…

r apache-spark save sparklyr

asked Dec 26 '17 at 20:33

user2345448

159
2
11

0

votes

1 answer

Sparklyr read database table to distributed DF

Hi I am trying to figure out if there is a way to directly read a DB table to a sparkR dataframe. I have rstudio installed on an EMR cluster which has my hive metastore on it. I know I can do the following: library(sparklyr) library(dplyr) sc <-…

r apache-spark amazon-emr sparkr sparklyr

asked Nov 30 '17 at 21:25

user295944

273
4
17

Questions tagged [sparklyr]