0

I appreciated any insights into staggered did (difference-in-differences) models.

I wanted to ask if I use the correct function to set-up the model for a did (data structure provided below):

did=time*treated

didreg = lm(y ~ time + treated + did + x + factor(year) + factor(firm), data = sample)

The data looks like:

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66

1 Answers1

0

I'm not familiar with difference-in-difference modelling, but from skimming the Wiki it seems that what you want is a simple interaction. To fit that, you don't even need to calculate a new variable (did), but you can specify it directly in the model. There's couple of ways to specify that with R formula syntax:

# Simple main effects models, no interactions
main_mod <- lm(y ~ time + treated + x + factor(year) + factor(firm), data = sample)

# Model with the interaction effect explicitly specified
did_mod1 <- lm(y ~ time + treated + time:treated + x + factor(year) + factor(firm), data = sample)

# Model with shortened syntax for specifying interactions
did_mod2 <- lm(y ~ time * treated + x + factor(year) + factor(firm), data = sample)

did_mod1 and did_mod2 are identical, did_mod2 is just a more compact way of writing the same model. The * indicates that you want both the main effects and the interactions of the variables to the left and the right. It's recommended to always fit main effects when you fit interactions, so the second way of writing the model saves time & space.

Adam B.
  • 788
  • 5
  • 14