3

I am fitting a diff-in-diff model on panel data with multiple treatment windows using the plm package.

In the plm package, there are options to set: - model = "within" vs model = "fd" (first difference).

Why dont the following produce equivalent coefficient estimates: - a regression with model = "within" when my variable is already differenced $Y_\textit{diff} = (Y_{t_i} - Y_{t_{i-1}})$, - a regression with model = fd when my variable is untransformed $Y_{t_i}$,

I cannot quite understand why the estimates are not identical:

  • Why might this be happening?
  • What is the correct use of the "model" parameter and, crucially:
    • when should *model = fd be used vs *model = within (especially, when dealing with "change variables" $Y_\textit{diff}$ on the LHS)
    • In your opinion, which of the two would be most appropriate here and how would you decide?
  • Finally, why are "twoway" fixed effects no longer allowed when using model = "fd"

Some background

In case it may be relevant, here is a high-level description of the current panel:

  1. I am currently using Panel Data that spans 27 years at the county/district level
  2. Counties are clustered into groups for the entire window (non-random assignment, e.g. urban/rural)
  3. It is a repeated measures design. Every few years, treatment occurs at the group level by random assignment - where all entities in a particular group either receive Treatment-X or Treatment-Y.
  4. While entities are not randomly assigned to groups, treatment is randomly assigned.
  5. Each of the two groups undergoes 2 exposures to Treatment-X and 2 exposures to Treatment-Y over the course of the entire time window, resulting in 4 pre-post measures.

Code and Sample Outputs

The code and outputs are included below. As you can see, the estimate for the diff-in-diff term for the model specified using "first difference" is quite different from the estimate for the diff-in-diff term for the model specified using "within" - even though the "within" models are run after differencing the Y variable. And, it does not matter whether the effects are specified at the "individual-level" or "two way"

To facilitate accurate statistical tests, we rely upon the lmtest and sandwich packages to produce standard errors that are robust to clustering at county-level using a helper function (code shown at the end).

Regression #1 - First-Difference & Individual Fixed Effects:

*Regression on Y with model = "fd" | Effects: Individual

Note: effect = "two-way" is not allowed for first difference models


PLM__Y__model.FD__effect.individual <-
  plm(Y ~  Pre.Post.Treatment * Treatment.or.Control
      , data=Panel, index=c("GEOID", "Year")
      , model="fd", effect = "individual")

get.coef.test.with.clustered.SEs(PLM__Y__model.FD__effect.individual)

Regression #2 - Model Within & Two-way Fixed Effects on Y.diff:

*Regression on Y.diff with model = "within" | Effects: Two-Way*

PLM__diff.Y__model.within__effect.individual <-
  plm(diff.Y ~  Pre.Post.Treatment * Treatment.or.Control
      , data=Panel, index=c("GEOID", "Year")
      , model="within", effect = "twoways")

get.coef.test.with.clustered.SEs(PLM__diff.Y__model.within__effect.individual)

Regression #3 - Model Within & Individual Fixed Effects on Y.diff: *Regression on Y.diff with model = "within" | Effects: Individual*

PLM__diff.Y__model.within__effect.individual <-
  plm(diff.Y ~  Pre.Post.Treatment * Treatment.or.Control
      , data=Panel, index=c("GEOID", "Year")
      , model="within", effect = "individual")

get.coef.test.with.clustered.SEs(PLM__diff.Y__model.within__effect.individual)

Regression Outputs

I expected regressions run on differenced variables $Y_\textit{diff} = (Y_{t_i} - Y_{t_{i-1}})$ using model="within" to produce the same estimates or standard errors as the panel regressions run with $Y_{t_i}$ using model ="fd"

0 Answers0