This is a variation of the question posed here. I really liked the dplyr solution there, but I cannot figure out a way to do it in the following case without looping, which is not ideal in a massive dataset.
Suppose I have a key dataframe df1
that contains all key dates for each id, including some ids with >1 key date:
id keydate
1 id1 2015-03-06
2 id1 2017-04-10
3 id1 2021-08-20
4 id2 2018-12-11
5 id2 2020-12-15
6 id3 2019-06-04
df1 <- structure(list(id = structure(1:6, .Label = c("id1", "id1", "id1", "id2", "id2", "id3"), class = "factor"),
date = structure(c(16500, 17266, 18859, 17876, 18611, 18051), class = "Date")),
.Names = c("id", "keydate"),
row.names = c(NA, -6L), class = "data.frame")
df1
will define the start date of a 10-day window of interest.
I have a second dataframe df2
that has ids and dates:
id date
1 id1 2015-03-10
2 id1 2021-08-29
3 id1 2021-10-19
4 id2 2018-12-20
5 id2 2021-01-29
6 id3 2019-07-12
7 id3 2019-08-23
df2 <- structure(list(id = structure(1:7, .Label = c("id1", "id1", "id1", "id2", "id2", "id3", "id3"), class = "factor"),
date = structure(c(16504, 18868, 18919, 17885, 18656, 18089, 18131), class = "Date")),
.Names = c("id", "date"),
row.names = c(NA, -7L), class = "data.frame")
I am trying to find all rows in df2
where keydate <= date <= keydate + 10 for all keydates in df1
matched by id. Desired output would look like this:
id date
1 id1 2015-03-10
2 id1 2021-08-29
3 id2 2018-12-20
specifically returning rows 1, 2, and 4 of df2
.