1

I'm trying to do a 'classic' difference in difference with multiple time periods. The model I want to do is:

y = a + b1x1 + b2_treat + b3_period + b_4(treat*period) + u (eq.1)

So basically I'm testing different setups just to make sure I specify my model in the right way, using different packages. I want to use the fixest-package, so I tried to compare the estimates with the estimates from the standard lm()-package. The results, however, differ -- both coefficients and std.errors.

My question is:

  1. Is either the lm_mod, lm_mod2 or the feols_mod regressions specified correctly (as in eq.1)?

If not, I would appreciate it if anyone can show me how to get the same results in lm() as in feols()!

# libraries
library(fixest)
library(modelsummary)
library(tidyverse)

# load data
data(base_did)

# make df for lm_mod with 5 as the reference-period
base_ref_5 <- base_did %>% 
  mutate(period = as.factor(period)) %>% 
  mutate(period = relevel(period, ref = 5))

# Notice that i use base_ref_5 for the lm model and base_did for the feol_mod.
lm_mod <- lm(y ~ x1 + treat*period, base_ref_5)
lm_mod2 <- lm(y ~ x1 + treat + period + treat*period, base_ref_5)
feols_mod <- feols(y ~ x1 + i(period, treat, ref = 5), base_did)

# compare models
models <- list("lm" = lm_mod, 
               "lm2" = lm_mod2,
               "feols" = feols_mod)

msummary(models, stars = T)

**EDIT:** 
the reason why I created base_ref_5 was so that both regressions would have period 5 as the reference period, if that was unclear.

**EDIT 2**: 
added a third model (lm_mod2) which is much closer, but there is still a difference.

Vincent
  • 15,809
  • 7
  • 37
  • 39
Tomas R
  • 440
  • 2
  • 10

1 Answers1

2

There are two issues, here.

  1. In the lm() model, the period variable is interacted, but treated as a continuous numeric variable. In contrast, calling i(period, treat) treats period as a factor (this is explained clearly in the documentation).
  2. The i() function only includes the interactions, and not the constitutive terms.

Here are two models to illustrate the parallels:

library(fixest)

data(base_did)

lm_mod <- lm(y ~ x1 + factor(period) * factor(treat), base_did)

feols_mod <- feols(y ~ x1 + factor(period) + i(period, treat), base_did)

coef(lm_mod)["x1"]

#>        x1 
#> 0.9799697

coef(feols_mod)["x1"]
#>        x1 
#> 0.9799697

Please note that I only answered the part of your question about parallels between lm and feols. StackOverflow is a programming Q&A site. If you have questions about the proper specification of a statistical model, you might want to ask on CrossValidated.

Vincent
  • 15,809
  • 7
  • 37
  • 39
  • 1
    Thanks, Vincent. I looked at the documentation again and found that the i() function has its own page in the documentation, which indeed explains this! As for StackOverflow vs CrossValidated - you're right that SO is for programming and CV for statistical Qs. I guess this is not the place for this particular discussion, but as a student, I often find myself a bit lost on where to go to resolve stats /metrics problems concerning R. I kinda wish there was a site that combined StackOverflow and CrossValidated :) – Tomas R Nov 04 '21 at 08:19