2

I need to estimate a difference-in-difference regression to understand the effect of a policy on various municipalities. I have a dataset that spans from 2001 to 2019, covering almost 7,900 municipalities. I am using R for the analysis.

I have a column indicating the precise year of policy implementation for each municipality, and this variable is called "Time." The dependent variable I want to test is called "INCOME_GR," which represents the annual income growth of residents in the municipality.

The policy I want to test was not applied uniformly across all territories. The application of this policy can vary in different percentages for each territory, making it continuous. The variable "Perc_Policy" indicates the frequency of policy application in each territory, which serves as the "treatment." This value ranges from 0 to 1, where 1 indicates that the entire territory is subject to the policy, while lower frequencies indicate smaller portions of the territory affected. For example, a value of 0.35 indicates that 35% of the municipal territory is subject to the policy. A value of 0 means there is no treatment.

The year of policy implementation can also vary because not all municipalities implemented it in the same year, making it "staggered."

The "treated" group in the first tested year of the policy consists of 420 municipalities. In the last tested year, the treated group comprises 1,243 municipalities. I have a minimum of 3 years of pre-treatment estimation and a minimum of 10 years of post-treatment estimation.

When estimating the DID (difference-in-differences) regression:

DID_INC_GR <- plm(formula =INCOME_GR ~ Perc_Policy * TIME, data = My_Data, effect = "twoways", model = "within", weights = WEIGHT) 

I have highly significant results (***).

The fact that this policy was implemented in different time periods across municipalities, rather than in a common time period for all municipalities, can introduce statistical challenges that hinder the interpretability of the results. To overcome this issue, I applied a methodology called "group-time average treatment effect" using the "did" package.

The "did" estimation allows for the decomposition of the treatment effect over the years, providing a more robust Average Treatment Effect. The results are significant, especially for the early years of intervention.

The problem I encounter is when I attempt to estimate individual years using the within two-ways regression model (for example, estimating for the year 2004 only, with 420 units in the "treated" group).

Twoways effects Within Model

Call:
plm(formula = INCOME_GR ~ Perc_Policy * TIME, data = Policy_application_2004, 
    weights = WEIGHT, effect = "twoways", model = "within")

Balanced Panel: n = 5656, T = 19, N = 107464

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.7750 -0.0097 -0.0001  0.0000  0.0089  6.3058 

Coefficients: (1 dropped because of singularities)
                Estimate    Std. Error  t-value     Pr(>|t|)    
Perc_Policy     -1.0055e-02     1.3702e-02  -0.7339     0.463    
TIME        5.1844e-06      1.1225e-06      4.6185      3.87e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    440.1
Residual Sum of Squares: 435.01
R-Squared:      0.00031699
Adj. R-Squared: -0.055418
F-statistic: 12.7934 on 2 and 101788 DF, p-value: 2.7836e-06
  1. Perc_Policy is no longer significant.
  2. Perc_Policy * TIME is dropped due to singularities.

I have also attempted to include some confounders in the regression, but the results remain unchanged. Therefore, I have encountered a singularity problem, and I'm unsure about its cause. Perc_Policy is totally exogenous and theoretically it has no correlation with INCOME_GR. I have considered that the issue may arise from the fact that a "treated" group of 420 municipalities is too small for the estimation, but the sample size seems acceptable to me.

Could someone provide possible causes and solutions? could the instability in the estimates for individual years also hinder the results I obtained with the previous regressions, when including all years of policy application (with a larger number of observations for the treated group)?

I tried changing the variable sizes, for example using logs: the problem persists. I tried using different control groups, also some very fitted to be compared with the treated group: nothing changed. I used different models, such as first difference or a normal plm: nothing changed.

So far, the singularity problem only disappear when I extend the regression to other years of implementation.

M_B
  • 21
  • 2
  • I somehow achieved on solving part of the problem by working with the ETWFE model. It gives me the estimations for each year and group, so I also control for singularities in specific treatment-years. Still, I don't know how to manage a treatment which is not 1 or 0, but continuous (I have a frequency). It could add more interesting explanations to the model. – M_B Jul 02 '23 at 07:43
  • I finally solved the problem for the treatment, by using the ETWFE and the DID2S models. both take into account also continuous treatment effect. significative results and possibility to distinguish for outliers and true effects. – M_B Jul 02 '23 at 11:26

0 Answers0