I'm trying to execute an event study that evaluates whether or not a specific individual participates in a specific event (event P) after experiencing a specific treatment (treatment E). I'm doing this by taking the observations of event E, and trying to merge with observations of event P, then I'm going to create an interval and evaluate it as shown in the example below:
library(tidyverse)
library(fuzzyjoin)
Event_E <- tibble::tribble(
~id, ~category, ~date,
1L, "a", "7/1/2000",
2L, "b", "11/1/2000",
3L, "c", "7/1/2002"
) %>%
mutate(date = as.Date(date, format = "%m/%d/%Y"))
Event_P <- tibble::tribble(
~category, ~other_info, ~start, ~end,
"a", "x", "7/30/2000", "12/31/2000",
"b", "y", "11/12/2000", "12/31/2001",
"b", "z", "8/1/2002", "12/31/2002"
) %>%
mutate_at(vars(start, end), as.Date, format = "%m/%d/%Y")
fuzzy_left_join(
Event_E, Event_P,
by = c(
"category" = "category",
"date" = "start"
),
match_fun = list(`==`, `<=`)
) %>%select(id, category = category.x, date,start)%>%
group_by(category)%>%slice_min(start)%>%mutate(
two_weeks=interval(start=date,end=date+weeks(2)),
P_within=case_when(start%within%two_weeks~"Yes",TRUE~"No"))
This process works great except for two issues: 1) my actual data is so large that it can't get by the fuzzy_left_join()
with the duplicates being made (I just need the soonest instance of event P relative to a specific event E, not all instances of event P for an individual that experiences event E), 2) I need to keep observations that have no event P (individual 3/category c experiences event E, but never follows up for event P, and gets cut out due to the NA).
Any tips? I'm confident I can solve issue 2, with an additional merge, but have hit a block on issue 1.