1

I could not solve this issue even after looking at several answers on stack overflow. I have a dataset with a list of dates and a list of intervals. I need to find how many intervals contains a date, for that given date. I could find several question on how many date were contained in a interval, but this is not what i am looking for.

Here is a reproducible example

d<-data.frame(ID=c(80, 736, 54, 259, 826, 446, 950, 841, 433, 518, 1357, 
     3686, 4042, 749, 2716, 4568, 1424, 332, 1000, 575, 1815, 3074, 
     3768, 932, 4, 3872, 2033, 2495, 3310), 
     date=ymd(c("2022-02-20", "2022-02-21", "2022-02-22", "2022-02-23", "2022-02-24", 
         "2022-02-25", "2022-02-26", "2022-02-27", "2022-02-28", 
         "2022-03-01", "2022-03-02", "2022-03-02", "2022-03-03", "2022-03-04", 
         "2022-03-05", "2022-03-05", "2022-03-06", "2022-03-07", "2022-03-08", 
         "2022-03-09", "2022-03-10", "2022-03-10", "2022-03-10", "2022-03-11", 
         "2022-03-12", "2022-03-12", "2022-03-13", "2022-03-13", "2022-03-13")),
     start.date= ymd(c( NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2022-03-02", "2022-03-02", 
                       "2022-03-03", "2022-03-04", "2022-03-05", "2022-03-05", "2022-03-06", 
                       NA, "2022-03-08", "2022-03-09", "2022-03-10", "2022-03-10", "2022-03-10", 
                       NA, "2022-03-12", "2022-03-12", "2022-03-13", "2022-03-13", "2022-03-13")),
      end.date=ymd(c( NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2022-03-15", "2022-03-10", 
                     "2022-03-07", "2022-03-14", "2022-03-29", "2022-03-17", "2022-03-21", 
                     NA, "2022-03-27", "2022-03-16", "2022-03-16", "2022-03-24", "2022-03-18", 
                     NA, "2022-03-22", "2022-03-18", "2022-03-22", "2022-03-30", "2022-03-19"
      )))

d<-d %>% mutate(interval=(start.date %--% end.date)) %>% select(-start.date,-end.date)

what I would like to get is, in a new column, for each date, the number of interval containing the date.

I normally use dplyr-lubridate, I tried to solve this with purrr but was not able to succeed. Any suggestion? Thank you EDIT:i tried with a purrr solution like the following d %>% mutate(dates_in_intv = map_int(interval, function(x) sum(.$date %within% x))) which is counting over how many dates an interval is spanning, but what I need would be something like d %>% mutate(intv_contains_dates= map_int(date, function(x) sum(.$interval "contains" x)))

mgreco
  • 23
  • 4
  • 1
    I don't understand _"the number of interval containing the date"_, could you explain it better? Best by showing the expected output. – jay.sf May 28 '22 at 07:59
  • thank you, i tried to better explain it now with an edit of the main question – mgreco May 28 '22 at 13:14
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community May 28 '22 at 14:52

1 Answers1

1

Is this what you are looking for?

library(tidyverse)
library(lubridate)
library(ivs)

d <- data.frame(
  ID = c(
    80, 736, 54, 259, 826, 446, 950, 841, 433, 518, 1357,
    3686, 4042, 749, 2716, 4568, 1424, 332, 1000, 575, 1815, 3074,
    3768, 932, 4, 3872, 2033, 2495, 3310
  ),
  date = ymd(c(
    "2022-02-20", "2022-02-21", "2022-02-22", "2022-02-23", "2022-02-24",
    "2022-02-25", "2022-02-26", "2022-02-27", "2022-02-28",
    "2022-03-01", "2022-03-02", "2022-03-02", "2022-03-03", "2022-03-04",
    "2022-03-05", "2022-03-05", "2022-03-06", "2022-03-07", "2022-03-08",
    "2022-03-09", "2022-03-10", "2022-03-10", "2022-03-10", "2022-03-11",
    "2022-03-12", "2022-03-12", "2022-03-13", "2022-03-13", "2022-03-13"
  )),
  start.date = ymd(c(
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2022-03-02", "2022-03-02",
    "2022-03-03", "2022-03-04", "2022-03-05", "2022-03-05", "2022-03-06",
    NA, "2022-03-08", "2022-03-09", "2022-03-10", "2022-03-10", "2022-03-10",
    NA, "2022-03-12", "2022-03-12", "2022-03-13", "2022-03-13", "2022-03-13"
  )),
  end.date = ymd(c(
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2022-03-15", "2022-03-10",
    "2022-03-07", "2022-03-14", "2022-03-29", "2022-03-17", "2022-03-21",
    NA, "2022-03-27", "2022-03-16", "2022-03-16", "2022-03-24", "2022-03-18",
    NA, "2022-03-22", "2022-03-18", "2022-03-22", "2022-03-30", "2022-03-19"
  ))
)

d %>% 
  mutate(iv = iv(start.date, end.date),
         count = iv_count_between(date, iv))
#>      ID       date start.date   end.date                       iv count
#> 1    80 2022-02-20       <NA>       <NA>                 [NA, NA)     0
#> 2   736 2022-02-21       <NA>       <NA>                 [NA, NA)     0
#> 3    54 2022-02-22       <NA>       <NA>                 [NA, NA)     0
#> 4   259 2022-02-23       <NA>       <NA>                 [NA, NA)     0
#> 5   826 2022-02-24       <NA>       <NA>                 [NA, NA)     0
#> 6   446 2022-02-25       <NA>       <NA>                 [NA, NA)     0
#> 7   950 2022-02-26       <NA>       <NA>                 [NA, NA)     0
#> 8   841 2022-02-27       <NA>       <NA>                 [NA, NA)     0
#> 9   433 2022-02-28       <NA>       <NA>                 [NA, NA)     0
#> 10  518 2022-03-01       <NA>       <NA>                 [NA, NA)     0
#> 11 1357 2022-03-02 2022-03-02 2022-03-15 [2022-03-02, 2022-03-15)     2
#> 12 3686 2022-03-02 2022-03-02 2022-03-10 [2022-03-02, 2022-03-10)     2
#> 13 4042 2022-03-03 2022-03-03 2022-03-07 [2022-03-03, 2022-03-07)     3
#> 14  749 2022-03-04 2022-03-04 2022-03-14 [2022-03-04, 2022-03-14)     4
#> 15 2716 2022-03-05 2022-03-05 2022-03-29 [2022-03-05, 2022-03-29)     6
#> 16 4568 2022-03-05 2022-03-05 2022-03-17 [2022-03-05, 2022-03-17)     6
#> 17 1424 2022-03-06 2022-03-06 2022-03-21 [2022-03-06, 2022-03-21)     7
#> 18  332 2022-03-07       <NA>       <NA>                 [NA, NA)     6
#> 19 1000 2022-03-08 2022-03-08 2022-03-27 [2022-03-08, 2022-03-27)     7
#> 20  575 2022-03-09 2022-03-09 2022-03-16 [2022-03-09, 2022-03-16)     8
#> 21 1815 2022-03-10 2022-03-10 2022-03-16 [2022-03-10, 2022-03-16)    10
#> 22 3074 2022-03-10 2022-03-10 2022-03-24 [2022-03-10, 2022-03-24)    10
#> 23 3768 2022-03-10 2022-03-10 2022-03-18 [2022-03-10, 2022-03-18)    10
#> 24  932 2022-03-11       <NA>       <NA>                 [NA, NA)    10
#> 25    4 2022-03-12 2022-03-12 2022-03-22 [2022-03-12, 2022-03-22)    12
#> 26 3872 2022-03-12 2022-03-12 2022-03-18 [2022-03-12, 2022-03-18)    12
#> 27 2033 2022-03-13 2022-03-13 2022-03-22 [2022-03-13, 2022-03-22)    15
#> 28 2495 2022-03-13 2022-03-13 2022-03-30 [2022-03-13, 2022-03-30)    15
#> 29 3310 2022-03-13 2022-03-13 2022-03-19 [2022-03-13, 2022-03-19)    15

Created on 2022-05-28 by the reprex package (v2.0.1)

And a purrr approach:

period_df <- d |> 
  select(start.date, end.date) |> 
  drop_na(start.date) 

map2_dfr(period_df$start.date, period_df$end.date, function(x, y) {
  
  d |> 
    distinct(date) |> 
    mutate(count = if_else(date >= x & date <= y, 1, 0))
}) |> 
  group_by(date) |> 
  summarise(count = sum(count)) |> 
  arrange(desc(count))
Carl
  • 4,232
  • 2
  • 12
  • 24
  • Hi Carl, Thank you for you answer! I see with this solution I can identify if the date is between the corresponding interval (start.date,end.date) on the same row, and count how many time time this is true. What I actually need is for each date to loop over all the interavls (all the pairs start.date-end.date) of the column, and count in how many intervals the date is present). This is why I was looking at a purrr solution (or maybe a for loop). – mgreco May 28 '22 at 12:45
  • I tried to better explain it now with an edit of the main question – mgreco May 28 '22 at 13:14
  • Does this update do what you need? – Carl May 28 '22 at 13:49
  • YES! Thank you! This is what I was looking for !!I was not aware of the ivs package, I will study it. I wonder whether there was also a purrr solution, but the ivs package seems to work fine! I marked the question as solved! – mgreco May 28 '22 at 16:55
  • 1
    I've added a purrr approach too. – Carl May 28 '22 at 17:34