I have a dateset like this dummy sample. It contains two months data for digital user activity.
df <- tibble::tribble(
~date, ~user_id, ~app_id, ~total_usage,
20190701, "18120439-aa", 383, 223,
20190702, "18120439-aa", 383, 147,
20190701, "18120439-ab", 382, 10,
20190701, "18120439-ab", 383, 395,
20190702, "18120439-ab", 383, 495,
20190703, "18120439-ab", 383, 347,
20190807, "18160102-aa", 262, 6,
20190808, "18160102-aa", 262, 24,
20190711, "18160102-aa", 382, 3,
20190725, "18160102-aa", 382, 11,
20190727, "18160102-aa", 382, 48,
20190702, "18160102-aa", 383, 6,
20190705, "18160102-aa", 383, 42,
20190706, "18160102-aa", 383, 104,
20190708, "18160102-aa", 383, 12,
20190709, "18160102-aa", 383, 13,
20190710, "18160102-aa", 383, 114,
20190712, "18160102-aa", 383, 187,
20190713, "18160102-aa", 383, 37,
20190715, "18160102-aa", 383, 58,
20190716, "18160102-aa", 383, 40,
20190717, "18160102-aa", 383, 40,
20190718, "18160102-aa", 383, 35,
20190719, "18160102-aa", 383, 19,
20190720, "18160102-aa", 383, 63,
20190723, "18160102-aa", 383, 2,
20190726, "18160102-aa", 383, 69,
20190729, "18160102-aa", 383, 31,
20190730, "18160102-aa", 383, 26,
20190731, "18160102-aa", 383, 41,
20190802, "18160102-aa", 383, 50,
20190805, "18160102-aa", 383, 34,
20190806, "18160102-aa", 383, 23,
20190807, "18160102-aa", 383, 14
)
First Task
I made a tsibble object with the key=c(user_id, app_id)
and date
as a index. First I tried to get the average usage per user per app_id in the whole period. The user can be online any number of days , 1 day, 10 days, 30 or every day == 60 days. I wanted to calculate rollapply mean based on user number of activity day.
example : user 18160102-aa were only active two days and used app_id 262 , first day use 6 minutes, second day 24 so avg usage for this user during the whole period for app_id = 262 is 15 minutes.
I used tsibble::tile_dbl to calculate average of window size 60 ( 2 months = 60 days) for the entire period and then convert it back to tibble and remove duplicate row. like this :
library(tidyverse)
library(tsibble)
df %>%
#create a tsibble object
as_tsibble(key = c(user_id, app_id), index = date) %>%
group_by_key() %>%
# calculate average during the activity period
mutate(Avg_period = tile_dbl(total_usage, ~ mean(., na.rm = TRUE), .size = 60)) %>%
as_tibble() %>%
select(-total_usage, -date) %>%
distinct()
The output seems correct however I am wondering if there is a better way to do this without making a duplicate?
Second Task
How can I set window size if I want to calculate average and rolling up usage for last 7 days, 14 and 21 day from a specific days?