You can do this with an outer join.
list_of_dates = data_with_missing_dates %>%
select(month_year) %>%
distinct()
data_with_filled_dates = data_with_missing_dates %>%
right_join(list_of_dates, by = "month_year")
These are all standard dplyr
commands, hence dbplyr
can translate them into bigquery.
The above assumes that your existing data includes all the dates you want in the final output (but spread over different ID values), hence list_of_dates
can be constructed from your initial dataset.
If there are dates that do not appear for any ID in your initial data that you want to appear in your final data, then you will need to construct list_of_dates
some other way. In this case even complete()
would not be sufficient by itself.
Edit so each ID has its own start and end
list_of_dates = data_with_missing_dates %>%
select(month_year) %>%
distinct() %>%
mutate(placeholder = 1)
date_limits = data_with_missing_dates %>%
group_by(ID) %>%
summarise(min_date = min(month_year),
max_date = max(month_year)) %>%
mutate(placeholder = 1)
data_with_filled_dates = date_limits %>%
outer_join(list_of_dates, by = "placeholder") %>%
filter(min_date <= month_year,
max_date >= month_year) %>%
select(ID, month_year) %>%
left_join(data_with_missing_dates, by = c("ID", "month_year"))