I'm trying to do a 'classic' difference in difference with multiple time periods. The model I want to do is:
y = a + b1x1 + b2_treat + b3_period + b_4(treat*period) + u (eq.1)
So basically I'm testing different setups just to make sure I specify my model in the right way, using different packages. I want to use the fixest-package, so I tried to compare the estimates with the estimates from the standard lm()-package. The results, however, differ -- both coefficients and std.errors.
My question is:
- Is either the lm_mod, lm_mod2 or the feols_mod regressions specified correctly (as in eq.1)?
If not, I would appreciate it if anyone can show me how to get the same results in lm() as in feols()!
# libraries
library(fixest)
library(modelsummary)
library(tidyverse)
# load data
data(base_did)
# make df for lm_mod with 5 as the reference-period
base_ref_5 <- base_did %>%
mutate(period = as.factor(period)) %>%
mutate(period = relevel(period, ref = 5))
# Notice that i use base_ref_5 for the lm model and base_did for the feol_mod.
lm_mod <- lm(y ~ x1 + treat*period, base_ref_5)
lm_mod2 <- lm(y ~ x1 + treat + period + treat*period, base_ref_5)
feols_mod <- feols(y ~ x1 + i(period, treat, ref = 5), base_did)
# compare models
models <- list("lm" = lm_mod,
"lm2" = lm_mod2,
"feols" = feols_mod)
msummary(models, stars = T)
**EDIT:**
the reason why I created base_ref_5 was so that both regressions would have period 5 as the reference period, if that was unclear.
**EDIT 2**:
added a third model (lm_mod2) which is much closer, but there is still a difference.