1

I want to determine the running total of a column within a specified date window.

id date value
7 2023-01-01 1
7 2023-01-03 1
7 2023-01-04 3
7 2023-01-05 2
7 2023-01-06 1
7 2023-01-07 5
7 2023-01-10 3
7 2023-01-14 2
15 2023-01-01 1
15 2023-01-02 1
15 2023-01-04 2
15 2023-01-07 2
15 2023-01-12 1
15 2023-01-13 1
15 2023-01-14 10

Given the table above, I need to determine the max rolling sum for any 7-day window for each id. (first window would 1/1 to 1/7, second 1/2-1/8, etc...)

In other words, for id 7, that's 14 for the 1/4-1/10 windows (3 + 2 + 1 + 5 + 3). For 15, that's 12 for the 1/8-1/14 window (1 + 1 + 10).

I was using the runner library, but I can't quite get what I should be getting.

Edit: This worked if the max is not in the first few windows:

library(runner)

df %>%
group_by(id) %>%
mutate(rsum = runner(value, k=7, idx=date, f=function(x) sum(x), na_pad=T)) %>%
filter(!is.na(rsum)) %>%
summrise(max_rsum = max(rsum))
Bea
  • 15
  • 6

2 Answers2

1

This approach assumes that each date value is unique within the group it falls in. It involves using complete() to add "dummy" records for each missing day/date that falls within the date range for each group. Doing so means 7 rows == 1 week. Rolling sums for each 7 day period (e.g. 7 rows) can then be calculated using roll_sum() from the RcppRoll package. As an example of the desired output was not provided, it is assumed a summary tibble is sufficient:

library(dplyr)
library(tidyr)
library(RcppRoll)

# Your example table as a dataframe
df <- read.table(text = "id     date    value
7   2023-01-01  1
7   2023-01-03  1
7   2023-01-04  3
7   2023-01-05  2
7   2023-01-06  1
7   2023-01-07  5
7   2023-01-10  3
7   2023-01-14  2
15  2023-01-01  1
15  2023-01-02  1
15  2023-01-04  2
15  2023-01-07  2
15  2023-01-12  1
15  2023-01-13  1
15  2023-01-14  10", header = T)
df$date <- as.Date(df$date)

df %>% 
  arrange(id, date) %>%
  group_by(id) %>%
  complete(date = full_seq(date, 1), fill = list(value = 0)) %>%
  mutate(max_7day_rollsum = roll_sum(value, 7, align = "right", fill = 0)) %>%
  filter(max_7day_rollsum == max(max_7day_rollsum)) %>%
  select(-value)

and the result:

# A tibble: 2 × 3
# Groups:   id [2]
     id date       max_7day_rollsum
  <int> <date>                <dbl>
1     7 2023-01-10               14
2    15 2023-01-14               12
L Tyrone
  • 1,268
  • 3
  • 15
  • 24
0

rollapplyr takes as arguments the value, vector of widths (how many values to sum at each point) and the function to apply (sum). Run that for each group and then take the maximum.

library(dplyr, exclude = c("filter", "lag"))
library(zoo)

df %>%
  mutate(date = as.Date(date)) %>% # can omit if already Date class
  group_by(id) %>%
  mutate(max = rollapplyr(value, 1:n() - findInterval(date - 7, date), sum)) %>%
  slice_max(max) %>%
  ungroup %>%
  select(-value)

giving:

# A tibble: 2 × 3
# Groups:   id [2]
     id date         max
  <int> <date>     <int>
1     7 2023-01-10    14
2    15 2023-01-14    12

Note

The input in reproducible form

df <- structure(list(id = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 15L, 15L, 
15L, 15L, 15L, 15L, 15L), date = c("2023-01-01", "2023-01-03", 
"2023-01-04", "2023-01-05", "2023-01-06", "2023-01-07", "2023-01-10", 
"2023-01-14", "2023-01-01", "2023-01-02", "2023-01-04", "2023-01-07", 
"2023-01-12", "2023-01-13", "2023-01-14"), value = c(1L, 1L, 
3L, 2L, 1L, 5L, 3L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 10L)), 
class = "data.frame", row.names = c(NA, -15L))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341