Formula with interaction terms in event-study designs using R

Question

I am estimating what's often called the "event-study" specification of a difference-in-differences model in R. Basically, we observe treated and control units over time and estimate a two-way fixed effects model with parameters for the "effect" of being treated in each time period (omitting one period, usually the one before treatment, as the reference period). I am struggling with how to compactly specify this model with R formulas.

For example, here is the model...

library(lfe)
library(tidyverse)
library(dummies)

N <- 100

df <- tibble(
    id = rep(1:N, 5),
    treat = id >= ceiling(N / 2),
    time = rep(1:5, each=N),
    x = rnorm(5 * N)
)

# produce an outcome variable
df <- df %>% mutate(
    y = x - treat * (time == 5) + time + rnorm(5*N)
)

head(df)

# easily recover the parameters with the true model...
summary(felm(
    y ~ x + I(treat * (time == 5)) | id + time, data = df
))

Now, I want to do an event-study design using period 4 as the baseline because treatment happens in period 5. We expect coefficients near zero on the pre-periods (1–4), and a negative treatment effect for the treated in the treated period (time == 5)

df$timefac <- factor(df$time, levels = c(4, 1, 2, 3, 5))
summary(felm(
    y ~ x + treat * timefac | id + time, data = df
))

That looks good, but produces lots of NAs because several of the coefficients are absorbed by the unit and time effects. Ideally, I can specify the model without those coefficients...

# create dummy for each time period for treated units
tdum <- dummy(df$time)
df <- bind_cols(df, as.data.frame(tdum))
df <- df %>% mutate_at(vars(time1:time5), ~ . * treat)

# estimate model, manually omitting one dummy
summary(felm(
    y ~ x + time1 + time2 + time3 + time5 | id + time, data = df
))

Now, the question is how to specify this model in a compact way. I thought the following would work, but it produces very unpredictable output...

summary(felm(
     y ~ x + treat:timefac | id + time, data = df
))

With the above, R does not use period 4 as the reference period and sometimes chooses to include the interaction with untreated rather than treated. The output is...

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
x                    0.97198    0.05113  19.009  < 2e-16 ***
treatFALSE:timefac4       NA         NA      NA       NA    
treatTRUE:timefac4  -0.19607    0.28410  -0.690  0.49051    
treatFALSE:timefac1       NA         NA      NA       NA    
treatTRUE:timefac1  -0.07690    0.28572  -0.269  0.78796    
treatFALSE:timefac2       NA         NA      NA       NA    
treatTRUE:timefac2        NA         NA      NA       NA    
treatFALSE:timefac3  0.15525    0.28482   0.545  0.58601    
treatTRUE:timefac3        NA         NA      NA       NA    
treatFALSE:timefac5  0.97340    0.28420   3.425  0.00068 ***
treatTRUE:timefac5        NA         NA      NA       NA

Is there a way to specify this model without having to manually produce dummies and interactions for treated units for every time period?

If you know Stata, I'm essentially looking for something as easy as:

areg y x i.treat##ib4.time, absorb(id)

(Note how simple it is to tell Stata to treat the variable as categorical — the i prefix —without making dummies for time and also indicate that period 4 should be the base period — the b4 prefix.)

score 3 · Answer 1 · answered Jul 24 '20 at 12:00

The package fixest performs fixed-effects estimations (like lfe) and includes utilities to deal with interactions. The function i (or interact) is what you're looking for.

Here is an example where the treatment is interacted with the year and year 5 is dropped out:

library(fixest)
data(base_did)
est_did = feols(y ~ x1 + i(treat, period, 5) | id + period, base_did)
est_did
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080 
#> Fixed-effects: id: 108,  period: 10
#> Standard-errors: Clustered (id) 
#>                   Estimate Std. Error   t value  Pr(>|t|)    
#> x1                0.973490   0.045678 21.312000 < 2.2e-16 ***
#> treat:period::1  -1.403000   1.110300 -1.263700  0.206646    
#> treat:period::2  -1.247500   1.093100 -1.141200  0.254068    
#> treat:period::3  -0.273206   1.106900 -0.246813  0.805106    
#> treat:period::4  -1.795700   1.088000 -1.650500  0.099166 .  
#> treat:period::6   0.784452   1.028400  0.762798  0.445773    
#> treat:period::7   3.598900   1.101600  3.267100  0.001125 ** 
#> treat:period::8   3.811800   1.247500  3.055500  0.002309 ** 
#> treat:period::9   4.731400   1.097100  4.312600   1.8e-05 ***
#> treat:period::10  6.606200   1.120500  5.895800  5.17e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -2,984.58   Adj. R2: 0.48783

The nice thing is that you can plot the interacted coefficients out of the estimation to have a quick visual representation of the results (if you find the graph too sober, no worries you can customize almost everything in it):

coefplot(est_did)

If you don't want to use fixest for estimation, you can still use the function i to create interactions. Its syntax is i(var, f, ref, drop, keep): it interacts the variable var with a dummy variable for each value in f. You can select which values of f to retain with the arguments ref, drop and keep. drop well... drops values from f and ref is the same as drop, but the references are shown in the coefplot (while the values in drop don't appear in the graph).

Here's an example of what i does:

head(with(base_did, i(treat, period, keep = 3:7)))
#>   treat:period::3 treat:period::4 treat:period::5 treat:period::6 treat:period::7
#> 1               0               0               0               0               0
#> 2               0               0               0               0               0
#> 3               1               0               0               0               0
#> 4               0               1               0               0               0
#> 5               0               0               1               0               0
#> 6               0               0               0               1               0
head(with(base_did, i(treat, period, drop = 3:7)))
#>   treat:period::1 treat:period::2 treat:period::8 treat:period::9 treat:period::10
#> 1               1               0               0               0                0
#> 2               0               1               0               0                0
#> 3               0               0               0               0                0
#> 4               0               0               0               0                0
#> 5               0               0               0               0                0
#> 6               0               0               0               0                0

You can find more information on fixest here.

This looks great! I like the clarity with which base periods and interactions (in both the fixed effects and coefficients of interest) are specified. — ChrisP, Jul 27 '20 at 20:57

score 1 · Accepted Answer · answered Jul 19 '20 at 17:22

You can redefine the timefac so that untreated observations are coded as the omitted time category.

df %>% 
  mutate(time = ifelse(treat == 0, 4, time),
         timefac = factor(time, levels = c(4, 1, 2, 3, 5)))

Then, you can use timefac without interactions and get a regression table with no NAs.

summary(felm(
  y ~ x + timefac | id + time, data = df
))

Coefficients:
          Estimate Std. Error t value Pr(>|t|)    
x          0.98548    0.05028  19.599  < 2e-16 ***
time_fac1 -0.01335    0.27553  -0.048    0.961    
time_fac2 -0.10332    0.27661  -0.374    0.709    
time_fac3  0.24169    0.27575   0.876    0.381    
time_fac5 -1.16305    0.27557  -4.221 3.03e-05 ***

This idea came from: https://blogs.worldbank.org/impactevaluations/econometrics-sandbox-event-study-designs-co

Formula with interaction terms in event-study designs using R

2 Answers2