find the sum after every seventh day but with missing days in R

Question

I have an R df as below:

| date_entered | returning | new |
| ------------ | --------- | --- |
| 2021-06-02   |         0 |  14 |
| 2021-06-03   |        12 |   8 |
| 2021-06-04   |         8 |   0 |
| 2021-06-05   |         7 |  15 |
| 2021-06-07   |        10 |   4 |
| 2021-06-08   |        12 |  22 |
| 2021-06-09   |         4 |  15 |
| 2021-06-10   |         0 |  28 |
| 2021-06-12   |        22 |  16 |
| 2021-06-15   |        20 |   5 |
| 2021-06-16   |        18 |  12 |
| 2021-06-17   |         2 |  10 |
| 2021-06-18   |        12 |   8 |

Note that there are missing dates. I need to find the sum of the "new" column from above after every seventh day like so:

| date_range              | sum_new |
| ----------------------- | ------- |
| 2021-06-02 - 2021-06-08 |      53 |
| 2021-06-09 - 2021-06-15 |      49 |
| 2021-06-16 - 2021-06-22 |      30 |

I tried the following:

library(zoo)
n <- 7
rollapply(df$new, n, sum, by = n)

but it ignores the dates and just adds after every seventh value of the df

| sum_new |
| ------- |
|      78 |
|      79 |

which is not what I am trying to get.

It's better if you provide the data with code in the end，i.e., `dput(df)`. — Peace Wang, Jul 20 '21 at 01:55
Because my problem is how to be efficient to copy then import your data into R. — Peace Wang, Jul 20 '21 at 02:18

score 2 · Answer 1 · answered Jul 20 '21 at 02:50

You can use cut to divide the dataset into 7-day interval and sum the new value.

library(dplyr)

df %>%
  group_by(group = cut(date_entered, '7 days')) %>%
  summarise(date_range = paste(min(date_entered), min(date_entered) + 6, sep = '-'), 
            sum_new = sum(new)) %>%
  select(-group)

#  date_range            sum_new
#  <chr>                   <int>
#1 2021-06-02-2021-06-08      63
#2 2021-06-09-2021-06-15      64
#3 2021-06-16-2021-06-22      30

G. Grothendieck · Answer 2 · 2021-07-20T12:08:50.627

We assume the input shown reproducibly in the Note at the end.

Remove the returning column, convert to zoo, which also converts the character dates to Date class and then convert that to ts. The conversion to ts will insert the missing dates giving new a value of NA for those. Then convert that back to zoo and fill in the NA's with 0. Next use rollapply. We align to the left and use partial=TRUE so that the partial week at the end is included. The dates that appear will be the beginning of each 7 day interval. Because ts cannot represent Date class the times will be numbers so convert them back to Date using aggregate giving the sums as a zoo object, sum_new. Optionally convert that to a data frame (or omit that and just use sum_new).

library(zoo)

sum_new <- df |>
  subset(select = -returning) |>
  read.zoo() |>
  as.ts() |>
  as.zoo() |>
  na.fill(0) |>
  rollapply(7, by = 7, sum, partial = TRUE, align = "left") |>
  aggregate(as.Date)

fortify.zoo(sum_new, name = "date")

giving:

        date sum_new
1 2021-06-02      63
2 2021-06-09      64
3 2021-06-16      30

Note

df <-
structure(list(date_entered = c("2021-06-02", "2021-06-03", "2021-06-04", 
"2021-06-05", "2021-06-07", "2021-06-08", "2021-06-09", "2021-06-10", 
"2021-06-12", "2021-06-15", "2021-06-16", "2021-06-17", "2021-06-18"
), returning = c(0L, 12L, 8L, 7L, 10L, 12L, 4L, 0L, 22L, 20L, 
18L, 2L, 12L), new = c(14L, 8L, 0L, 15L, 4L, 22L, 15L, 28L, 16L, 
5L, 12L, 10L, 8L)), class = "data.frame", row.names = c(NA, -13L

score 0 · Answer 3 · answered Jul 20 '21 at 02:43

My solution is to group the date_entered by your range, and then sum through
each group.

dt %>% 
    group_by(dr = findInterval(date_entered, as.Date(c("2021-06-09", "2021-06-16")))) %>%
    summarise(sum_new = sum(new))

Or with data.table

dt <- fread("
date_entered  returning  new 
2021-06-02            0   14 
2021-06-03           12    8 
2021-06-04            8    0 
2021-06-05            7   15 
2021-06-07           10    4 
2021-06-08           12   22 
2021-06-09            4   15 
2021-06-10            0   28 
2021-06-12           22   16 
2021-06-15           20    5 
2021-06-16           18   12 
2021-06-17            2   10 
2021-06-18           12    8 ")

dt[,.(sum_new = sum(new)), by = .(dr = findInterval(date_entered, as.Date(c("2021-06-09", "2021-06-16"))))]

Try: `fi <- function(x) findInterval(x, seq(x[1]+7, tail(x, 1), 7)); df %>% group_by(g = fi(as.Date(date_entered))) %>% summarize(date = max(date_entered), sum_new = sum(new)) %>% select(-g)` — G. Grothendieck, Jul 20 '21 at 12:49

find the sum after every seventh day but with missing days in R

3 Answers3

Note

Linked