I want to calculate a some descriptive statistics:
- by specific customer (in example,
customer_id
field) - for a sliding time window (in example, 24h)
- excluding the current value (using only previous values in period)
Example of a data frame:
dt <- tibble(
order_id = 1:10,
customer_id = c(1, rep(2, 2), rep(3, 3), rep(4, 4)),
amount = seq(10, 100, by = 10),
date = c("2020-10-07 12:00", # 1st customer
"2020-10-07 12:00", "2020-10-08 11:00", # 2st customer
"2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", # 3rd customer
"2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", "2020-10-08 21:00") # 4th customer
) %>%
mutate(
date = lubridate::ymd_hm(date)
)
Expected result:
order_id customer_id amount date amount__n__last_24h amount__mean__last_24h
<int> <dbl> <dbl> <dttm> <dbl> <dbl>
1 1 10 2020-10-07 12:00:00 0 NA
2 2 20 2020-10-07 12:00:00 0 NA
3 2 30 2020-10-08 11:00:00 1 20
4 3 40 2020-10-07 12:00:00 0 NA
5 3 50 2020-10-08 11:00:00 1 40
6 3 60 2020-10-08 20:00:00 1 50
7 4 70 2020-10-07 12:00:00 0 NA
8 4 80 2020-10-08 11:00:00 1 70
9 4 90 2020-10-08 20:00:00 1 80
10 4 100 2020-10-08 21:00:00 2 85
How can I do this?