0

I have an R df as below:

| date_entered | returning | new |
| ------------ | --------- | --- |
| 2021-06-02   |         0 |  14 |
| 2021-06-03   |        12 |   8 |
| 2021-06-04   |         8 |   0 |
| 2021-06-05   |         7 |  15 |
| 2021-06-07   |        10 |   4 |
| 2021-06-08   |        12 |  22 |
| 2021-06-09   |         4 |  15 |
| 2021-06-10   |         0 |  28 |
| 2021-06-12   |        22 |  16 |
| 2021-06-15   |        20 |   5 |
| 2021-06-16   |        18 |  12 |
| 2021-06-17   |         2 |  10 |
| 2021-06-18   |        12 |   8 |

  • Note that there are missing dates. I need to find the sum of the "new" column from above after every seventh day like so:
| date_range              | sum_new |
| ----------------------- | ------- |
| 2021-06-02 - 2021-06-08 |      53 |
| 2021-06-09 - 2021-06-15 |      49 |
| 2021-06-16 - 2021-06-22 |      30 |

I tried the following:

library(zoo)
n <- 7
rollapply(df$new, n, sum, by = n)

but it ignores the dates and just adds after every seventh value of the df

| sum_new |
| ------- |
|      78 |
|      79 |

which is not what I am trying to get.

doubleD
  • 269
  • 1
  • 12

3 Answers3

2

You can use cut to divide the dataset into 7-day interval and sum the new value.

library(dplyr)

df %>%
  group_by(group = cut(date_entered, '7 days')) %>%
  summarise(date_range = paste(min(date_entered), min(date_entered) + 6, sep = '-'), 
            sum_new = sum(new)) %>%
  select(-group)

#  date_range            sum_new
#  <chr>                   <int>
#1 2021-06-02-2021-06-08      63
#2 2021-06-09-2021-06-15      64
#3 2021-06-16-2021-06-22      30
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

We assume the input shown reproducibly in the Note at the end.

Remove the returning column, convert to zoo, which also converts the character dates to Date class and then convert that to ts. The conversion to ts will insert the missing dates giving new a value of NA for those. Then convert that back to zoo and fill in the NA's with 0. Next use rollapply. We align to the left and use partial=TRUE so that the partial week at the end is included. The dates that appear will be the beginning of each 7 day interval. Because ts cannot represent Date class the times will be numbers so convert them back to Date using aggregate giving the sums as a zoo object, sum_new. Optionally convert that to a data frame (or omit that and just use sum_new).

library(zoo)

sum_new <- df |>
  subset(select = -returning) |>
  read.zoo() |>
  as.ts() |>
  as.zoo() |>
  na.fill(0) |>
  rollapply(7, by = 7, sum, partial = TRUE, align = "left") |>
  aggregate(as.Date)

fortify.zoo(sum_new, name = "date")

giving:

        date sum_new
1 2021-06-02      63
2 2021-06-09      64
3 2021-06-16      30

Note

df <-
structure(list(date_entered = c("2021-06-02", "2021-06-03", "2021-06-04", 
"2021-06-05", "2021-06-07", "2021-06-08", "2021-06-09", "2021-06-10", 
"2021-06-12", "2021-06-15", "2021-06-16", "2021-06-17", "2021-06-18"
), returning = c(0L, 12L, 8L, 7L, 10L, 12L, 4L, 0L, 22L, 20L, 
18L, 2L, 12L), new = c(14L, 8L, 0L, 15L, 4L, 22L, 15L, 28L, 16L, 
5L, 12L, 10L, 8L)), class = "data.frame", row.names = c(NA, -13L
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

My solution is to group the date_entered by your range, and then sum through
each group.

dt %>% 
    group_by(dr = findInterval(date_entered, as.Date(c("2021-06-09", "2021-06-16")))) %>%
    summarise(sum_new = sum(new))

Or with data.table

dt <- fread("
date_entered  returning  new 
2021-06-02            0   14 
2021-06-03           12    8 
2021-06-04            8    0 
2021-06-05            7   15 
2021-06-07           10    4 
2021-06-08           12   22 
2021-06-09            4   15 
2021-06-10            0   28 
2021-06-12           22   16 
2021-06-15           20    5 
2021-06-16           18   12 
2021-06-17            2   10 
2021-06-18           12    8 ")

dt[,.(sum_new = sum(new)), by = .(dr = findInterval(date_entered, as.Date(c("2021-06-09", "2021-06-16"))))]
Peace Wang
  • 2,399
  • 1
  • 8
  • 15
  • Try: `fi <- function(x) findInterval(x, seq(x[1]+7, tail(x, 1), 7)); df %>% group_by(g = fi(as.Date(date_entered))) %>% summarize(date = max(date_entered), sum_new = sum(new)) %>% select(-g)` – G. Grothendieck Jul 20 '21 at 12:49