I'm not familiar with difference-in-difference modelling, but from skimming the Wiki it seems that what you want is a simple interaction. To fit that, you don't even need to calculate a new variable (did
), but you can specify it directly in the model. There's couple of ways to specify that with R formula syntax:
# Simple main effects models, no interactions
main_mod <- lm(y ~ time + treated + x + factor(year) + factor(firm), data = sample)
# Model with the interaction effect explicitly specified
did_mod1 <- lm(y ~ time + treated + time:treated + x + factor(year) + factor(firm), data = sample)
# Model with shortened syntax for specifying interactions
did_mod2 <- lm(y ~ time * treated + x + factor(year) + factor(firm), data = sample)
did_mod1
and did_mod2
are identical, did_mod2
is just a more compact way of writing the same model. The *
indicates that you want both the main effects and the interactions of the variables to the left and the right. It's recommended to always fit main effects when you fit interactions, so the second way of writing the model saves time & space.