I am trying to replicate the tidyr:complete
function in sparklyr. I have a dataframe with some missing values and I have to fill out those rows. In dplyr/tidyr I can do:
data <- tibble(
"id" = c(1,1,2,2),
"dates" = c("2020-01-01", "2020-01-03", "2020-01-01", "2020-01-03"),
"values" = c(3,4,7,8))
# A tibble: 4 x 3
id dates values
<dbl> <chr> <dbl>
1 1 2020-01-01 3
2 1 2020-01-03 4
3 2 2020-01-01 7
4 2 2020-01-03 8
data %>%
mutate(dates = as_date(dates)) %>%
group_by(id) %>%
complete(dates = seq.Date(min(dates), max(dates), by="day"))
# A tibble: 6 x 3
# Groups: id [2]
id dates values
<dbl> <date> <dbl>
1 1 2020-01-01 3
2 1 2020-01-02 NA
3 1 2020-01-03 4
4 2 2020-01-01 7
5 2 2020-01-02 NA
6 2 2020-01-03 8
However the complete
function does not exist in sparklyr
.
data_spark %>%
mutate(dates = as_date(dates)) %>%
group_by(id) %>%
complete(dates = seq.Date(min(dates), max(dates), by="day"))
Error in UseMethod("complete_") :
no applicable method for 'complete_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"
Is there a way to set a UDF or to achieve a similar result?
Thank you