0

I want to make an event study plot in R. I have a dataset like this in R:

library(tidyverse)

groups <- c("A", "B", "C", "D", "E")

data <- tibble(id=1:10000,
                date=sample(seq(as.Date("2006-01-01"), 
                                as.Date("2019-01-01"), by="day"),
                            10000, replace = T),
                group=sample(groups, 10000, replace=T),
                treat=ifelse(group %in% c("A", "B"), 1, 0),
                after=ifelse(date>as.Date("2015-05-01"), 1, 0),
                results=rnorm(10000)+ifelse(treat*after==1, 0.2, 0)
)

How can I make an event study plot like this that shows the difference between the results of the treated and the untreated per year?

Ajern
  • 11
  • 3

2 Answers2

0

It works like this:

First calculate the standard error in each year:

se <- data %>% 
  group_by(year(date), treat) %>% 
  rename(year = `year(date)`) %>%
  summarize(var = var(results, na.rm=TRUE), n = n(), .groups="drop_last") %>%
  summarize(se=sqrt(sum(var*(n-1))/(sum(n)-2))*sum(1/n))

Now determine the mean difference between treated and untreated and plot the event study plot:

data %>% 
  group_by(year(date), treat) %>% 
  rename(year = `year(date)`) %>%
  summarize(mean_effect = mean(results, na.rm=TRUE)) %>% 
  spread(treat, mean_effect) %>%
  summarize(diff = `1` - `0`) %>%
  add_column(se = as.data.frame(se)) %>%
  ggplot(aes(x=year, y=diff)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin=diff-1.96*se$se, ymax=diff+1.96*se$se), colour="black", width=.1, position=position_dodge(0.1)) +
  geom_vline(xintercept=2015, linetype = "dashed") +
  geom_hline(yintercept=0.0, linetype = "dashed") +
  labs(title="Example event study plot", x="Year", "Difference and 95% Conf. Int.")

When calculating the confidence intervals, I assumed that the data is approximately normal distributed due to the large sample size (10.000)

eventstudyplot

Simon
  • 58
  • 5
  • 1
    The standard error of a difference in means has a specific formula which depends on the variance and the number of observations of each group (not the pooled variance / pooled n). Also, a 95% CI is not equal to 0.95 * se, but to se * the value of the student t distribution with the right number of degrees of freedom such that 97.5% of the distribution is below this value. It can be obtained using `qt(0.975, n() - 2)`, and will never be as low as 0.475, but rather close to 2. So the correct confidence intervals are actually larger than what is displayed here. – L-- Mar 27 '23 at 14:44
0

This should do the job:

gg.data <-
  data %>%
  ## format the factorial variables correctly
  mutate(
    treat = factor(treat, levels = c(0, 1), labels = c('control', 'treated')), 
    after = factor(after, levels = c(0, 1), labels = c('before 2015-05', 'after 2015-05'))
  ) %>%
  ## compute mean and CI of the results
  group_by(
    treat,
    after
  ) %>%
  summarise(
    results.mean = mean(results),
    results.lower = quantile(results, 0.025),
    results.upper = quantile(results, 0.975),
    .groups = 'drop'
  )

ggplot(aes(x = after, y = results.mean, ymin = results.lower, ymax = results.upper, color = treat, group = treat), data = gg.data) +
  geom_pointrange(position = position_dodge(width = .1)) +
  geom_line(position = position_dodge(width = .1)) + 
  theme_light()

enter image description here

Note 1: position_dodge(width = .1) is important to distinguish between poin ranges

Note 2: you can use geom_errorbar() if you want to add the horizontal bars around CI's