I am fitting a diff-in-diff model on panel data with multiple treatment windows using the plm package.
In the plm package, there are options to set: - model = "within" vs model = "fd" (first difference).
Why dont the following produce equivalent coefficient estimates:
- a regression with model = "within" when my variable is already differenced $Y_\textit{diff} = (Y_{t_i} - Y_{t_{i-1}})$
,
- a regression with model = fd
when my variable is untransformed $Y_{t_i}$
,
I cannot quite understand why the estimates are not identical:
- Why might this be happening?
- What is the correct use of the "model" parameter and, crucially:
- when should *model =
fd
be used vs *model =within
(especially, when dealing with "change variables"$Y_\textit{diff}$
on the LHS) - In your opinion, which of the two would be most appropriate here and how would you decide?
- when should *model =
- Finally, why are "twoway" fixed effects no longer allowed when using model = "fd"
Some background
In case it may be relevant, here is a high-level description of the current panel:
- I am currently using Panel Data that spans 27 years at the county/district level
- Counties are clustered into groups for the entire window (non-random assignment, e.g. urban/rural)
- It is a repeated measures design. Every few years, treatment occurs at the group level by random assignment - where all entities in a particular group either receive Treatment-X or Treatment-Y.
- While entities are not randomly assigned to groups, treatment is randomly assigned.
- Each of the two groups undergoes 2 exposures to Treatment-X and 2 exposures to Treatment-Y over the course of the entire time window, resulting in 4 pre-post measures.
Code and Sample Outputs
The code and outputs are included below. As you can see, the estimate for the diff-in-diff term for the model specified using "first difference" is quite different from the estimate for the diff-in-diff term for the model specified using "within" - even though the "within" models are run after differencing the Y variable. And, it does not matter whether the effects are specified at the "individual-level" or "two way"
To facilitate accurate statistical tests, we rely upon the lmtest and sandwich packages to produce standard errors that are robust to clustering at county-level using a helper function (code shown at the end).
Regression #1 - First-Difference & Individual Fixed Effects:
*Regression on Y with model = "fd" | Effects: Individual
Note: effect = "two-way" is not allowed for first difference models
PLM__Y__model.FD__effect.individual <-
plm(Y ~ Pre.Post.Treatment * Treatment.or.Control
, data=Panel, index=c("GEOID", "Year")
, model="fd", effect = "individual")
get.coef.test.with.clustered.SEs(PLM__Y__model.FD__effect.individual)
Regression #2 - Model Within & Two-way Fixed Effects on Y.diff:
*Regression on Y.diff with model = "within" | Effects: Two-Way*
PLM__diff.Y__model.within__effect.individual <-
plm(diff.Y ~ Pre.Post.Treatment * Treatment.or.Control
, data=Panel, index=c("GEOID", "Year")
, model="within", effect = "twoways")
get.coef.test.with.clustered.SEs(PLM__diff.Y__model.within__effect.individual)
Regression #3 - Model Within & Individual Fixed Effects on Y.diff: *Regression on Y.diff with model = "within" | Effects: Individual*
PLM__diff.Y__model.within__effect.individual <-
plm(diff.Y ~ Pre.Post.Treatment * Treatment.or.Control
, data=Panel, index=c("GEOID", "Year")
, model="within", effect = "individual")
get.coef.test.with.clustered.SEs(PLM__diff.Y__model.within__effect.individual)
I expected regressions run on differenced variables $Y_\textit{diff} = (Y_{t_i} - Y_{t_{i-1}})$ using model="within"
to produce the same estimates or standard errors as the panel regressions run with $Y_{t_i}$ using model ="fd"