1

I'm trying to execute an event study that evaluates whether or not a specific individual participates in a specific event (event P) after experiencing a specific treatment (treatment E). I'm doing this by taking the observations of event E, and trying to merge with observations of event P, then I'm going to create an interval and evaluate it as shown in the example below:

library(tidyverse)
library(fuzzyjoin)

Event_E <- tibble::tribble(
  ~id, ~category,       ~date,
  1L,       "a",  "7/1/2000",
  2L,       "b", "11/1/2000",
  3L,       "c",  "7/1/2002"
) %>%
  mutate(date = as.Date(date, format = "%m/%d/%Y"))

Event_P <- tibble::tribble(
  ~category, ~other_info,     ~start,         ~end,
  "a",         "x", "7/30/2000", "12/31/2000",
  "b",         "y", "11/12/2000", "12/31/2001",
  "b",         "z", "8/1/2002", "12/31/2002"
) %>%
  mutate_at(vars(start, end), as.Date, format = "%m/%d/%Y")


fuzzy_left_join(
  Event_E, Event_P,
  by = c(
    "category" = "category",
    "date" = "start"
  ),
  match_fun = list(`==`, `<=`)
) %>%select(id, category = category.x, date,start)%>%
  group_by(category)%>%slice_min(start)%>%mutate(
  two_weeks=interval(start=date,end=date+weeks(2)),
  P_within=case_when(start%within%two_weeks~"Yes",TRUE~"No"))

This process works great except for two issues: 1) my actual data is so large that it can't get by the fuzzy_left_join() with the duplicates being made (I just need the soonest instance of event P relative to a specific event E, not all instances of event P for an individual that experiences event E), 2) I need to keep observations that have no event P (individual 3/category c experiences event E, but never follows up for event P, and gets cut out due to the NA).

Any tips? I'm confident I can solve issue 2, with an additional merge, but have hit a block on issue 1.

Arthur Yip
  • 5,810
  • 2
  • 31
  • 50
EconMatt
  • 339
  • 2
  • 7

1 Answers1

1

Setting a max_dist=30 might help if you know the soonest instance is always within 30 days. Or you could break up your Event E into 10 chunks, loop through them with fuzzy_left_join, and then bind_rows afterwards.

Arthur Yip
  • 5,810
  • 2
  • 31
  • 50