I'm trying to count the number of unique elements in each column in the spark dataset s.
However It seems that spark doesn't recognize tally()
k<-collect(s%>%group_by(grouping_type)%>%summarise_each(funs(tally(distinct(.)))))
Error:…
R version 3.3.0 (2016-05-03)
Sparklyr version ‘0.7.0’
Spark version 2.1 on YARN client
I am using Spark framework in R using Sparklyr for generating top-5 recommendations for products which are likely to be sold and their expected quantity using ALS…
I have a number of Hive files in parquet format that contain both string and double columns. I can read most of them into a Spark Data Frame with sparklyr using the syntax below:
spark_read_parquet(sc, name = "name", path = "path", memory =…
I am using R and sparklyr process some data from Spark. I am reading two parquet files, in sequence, with
v1 <- spark_read_parquet(sc, "events","s3n://project/sessions.parquet", memory="true")
head(v1)
v2 <- spark_read_parquet(sc,…
I tried using sdf_pivot() to widen my column with duplicate values into multiple (a very big number) columns. I planned to use these columns as the feature space for training an ML model.
Example: I have a language element sequence in one column…
I have some sample code that is only working on one machine. After some testing, I discovered that the machine that worked was running R 3.4.2 while everything else was running 3.4.3.
After some work I discovered that the way you access the…
I know similar question has been asked multiple times before but I have tried all those options and still not get desired result.
I have a sdf as kl in following format:
CONSUMER_ID TimeStamp TimeStamp2
…
I found this example of calling spark.mllib functions directly from Scala library. I don't get all things here, but anyway is it possible to call any MLlib function (which is not present via, let's say, spaklyr) this way? In particular I am…
I am trying to use Spark in R Studio using the sparklyr library on MacOS. I have installed it using the following commands
# Install the sparklyr package
install.packages("sparklyr")
# Now load the library
library(sparklyr)
# Install Spark to your…
I am pretty new to Spark, I have tried to look for something on the web but I haven't found anything satisfactory.
I have always run parallel computations using the command mclapply and I like its structure (i.e., first parameter used as scrolling…
I have a Spark data frame sdf. I would like to generate another table with columns of sdf, however those columns can repeat themselves.
The following is the desired output.
> sdf %>% select(DC1_Y1,DC2_Y1,DC2_Y1)
# Source: lazy query [?? x 3]
#…
I am training some models (random forest) using ml library in Spark, R, and sparklyr. Everything ok, but now I need to save those models, so they can be used to make predictions for new data.
I call
ml_save(rfW1,w$fileName)
where rfW1 is the…
Hi I am trying to figure out if there is a way to directly read a DB table to a sparkR dataframe. I have rstudio installed on an EMR cluster which has my hive metastore on it.
I know I can do the following:
library(sparklyr)
library(dplyr)
sc <-…