0

I'm following this diff in diff study where I want to see the effect of student grants of enrollment. there are 12 provinces and 3 of them applied the policy of giving student grants from 2016 on. I have to test if the parallel trend assumption holds. when I run the regression with clustered standard error I get negative coefficients in the pre treatment group, and in the after treatment period the coefficients are not statistically significant.

Also when I plot it the plot, for the pre treatment the values are given as negative. i assume that this indicates that the 2 samples are not comparable, and sofferfrom perfect multicollinearity.

is it the case? or could it be some problem in the code-lines/variables?

PROCESS:

create pre and post intervention dummy called "after_change" which gets value 1 if the year is after 2016 and 0 otherwise

data3<-data3 %>% 
  mutate(after_change = year)
data3$after_change <- as.numeric(data3$after_change)
data3$after_change[data3$after_change < "2016"] <- "0"
data3$after_change[data3$after_change >= "2016"] <- "1"

create a dummy variable for treatment (=1) and control (=0) group. if the region = 8, 10, 11 then the value for the dummy variable is =1 (treatment group) if the region = 1,2,3,4,5,6,7,9,12 then the value for the dummy variable is =0 (control group) `

data3 <- data3 %>% 
  mutate (province_dummy=province)
data3$province_dummy=ifelse(data3$province==8,1, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==10,1, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==11,1, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==1,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==2,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==3,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==4,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==5,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==6,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==7,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==9,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==12,0, data3$province_dummy)

pacman::p_load(fixest)

#create variable for treatment group and pre/post treatment

dataf = data3 %>%
  mutate(post = year >= 2016,
         pre = year < 2016,
         treat = province_dummy == 1)

regression with clustered error

didreg = feols(enroll ~ treat*as.factor(year), cluster =~ province, data = dataf)
summary(didreg)
OLS estimation, Dep. Var.: enroll
Observations: 1,147,797 
Standard-errors: Clustered (province) 
                               Estimate Std. Error    t value   Pr(>|t|)    
(Intercept)                    0.696301   0.001210 575.382840  < 2.2e-16 ***
treatTRUE                     -0.007199   0.004052  -1.776817 1.0323e-01    
as.factor(year)2011            0.002674   0.002349   1.138227 2.7921e-01    
as.factor(year)2012            0.007658   0.002573   2.976554 1.2597e-02 *  
as.factor(year)2013            0.006626   0.001675   3.956681 2.2468e-03 ** 
as.factor(year)2014            0.010876   0.001882   5.778537 1.2310e-04 ***
as.factor(year)2015            0.007739   0.002004   3.862600 2.6414e-03 ** 
as.factor(year)2016            0.012949   0.001560   8.298654 4.6024e-06 ***
as.factor(year)2017            0.014112   0.002943   4.794603 5.5816e-04 ***
as.factor(year)2018            0.017982   0.002331   7.713993 9.2182e-06 ***
as.factor(year)2019            0.018066   0.001410  12.812988 5.9136e-08 ***
treatTRUE:as.factor(year)2011 -0.009903   0.004360  -2.271484 4.4192e-02 *  
treatTRUE:as.factor(year)2012 -0.010737   0.003663  -2.930719 1.3673e-02 *  
treatTRUE:as.factor(year)2013 -0.010446   0.003502  -2.982775 1.2458e-02 *  
treatTRUE:as.factor(year)2014 -0.009357   0.004575  -2.045244 6.5510e-02 .  
treatTRUE:as.factor(year)2015 -0.003574   0.003624  -0.986359 3.4516e-01    
treatTRUE:as.factor(year)2016  0.007486   0.007415   1.009658 3.3435e-01    
treatTRUE:as.factor(year)2017  0.008532   0.005418   1.574857 1.4359e-01    
treatTRUE:as.factor(year)2018  0.008883   0.003855   2.304443 4.1705e-02 *  
treatTRUE:as.factor(year)2019  0.010301   0.006955   1.481036 1.6666e-01    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.456241   Adj. R2: 3.696e-4

2- event study

OLS_event <- data.frame(year = c(2011:2019),
                        point = didreg$coefficients[12:20],
                        sd.error = sqrt(diag(vcov(didreg)))[12:20]) %>% 
  mutate(ymin = point - sd.error,
         ymax = point + sd.error)

`

plot_Q5 <- ggplot(OLS_event, aes(x = year, y = point)) +
  geom_point(size = 0.5, color = "Black") +
  geom_vline(xintercept = 2016) +
  geom_pointrange(aes(ymin = ymin, ymax = ymax), size = 0.5, color = "Black") +
  labs(y = "Treated coefficient for each year", x = "Year", title = "Coefficient view") +
  theme_bw()

`

I expected to have comparable groups, since if I try to regress with:

ggplot(dataf, aes(year, enroll, color = treat)) +
  stat_summary(geom = 'line') +
  geom_vline(xintercept = 2016) +
  theme_minimal()

I get values higher than zero, which actually reflect the mean of the 2 groups

   province_dummy after_change    enroll
          0            0        0.7022814
          1            0        0.6877263
  • 1
    Hi Fabio, it’s not very clear what your question is - can you please [edit] your title and post to start with a statement of your specific question? Also, since I think you’re asking about model interpretation, this would probably be more appropriate for [Cross Validated](https://stats.stackexchange.com/). – zephryl Nov 23 '22 at 10:35

0 Answers0