I'm following this diff in diff study where I want to see the effect of student grants of enrollment. there are 12 provinces and 3 of them applied the policy of giving student grants from 2016 on. I have to test if the parallel trend assumption holds. when I run the regression with clustered standard error I get negative coefficients in the pre treatment group, and in the after treatment period the coefficients are not statistically significant.
Also when I plot it the plot, for the pre treatment the values are given as negative. i assume that this indicates that the 2 samples are not comparable, and sofferfrom perfect multicollinearity.
is it the case? or could it be some problem in the code-lines/variables?
PROCESS:
create pre and post intervention dummy called "after_change" which gets value 1 if the year is after 2016 and 0 otherwise
data3<-data3 %>%
mutate(after_change = year)
data3$after_change <- as.numeric(data3$after_change)
data3$after_change[data3$after_change < "2016"] <- "0"
data3$after_change[data3$after_change >= "2016"] <- "1"
create a dummy variable for treatment (=1) and control (=0) group. if the region = 8, 10, 11 then the value for the dummy variable is =1 (treatment group) if the region = 1,2,3,4,5,6,7,9,12 then the value for the dummy variable is =0 (control group) `
data3 <- data3 %>%
mutate (province_dummy=province)
data3$province_dummy=ifelse(data3$province==8,1, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==10,1, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==11,1, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==1,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==2,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==3,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==4,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==5,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==6,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==7,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==9,0, data3$province_dummy)
data3$province_dummy=ifelse(data3$province==12,0, data3$province_dummy)
pacman::p_load(fixest)
#create variable for treatment group and pre/post treatment
dataf = data3 %>%
mutate(post = year >= 2016,
pre = year < 2016,
treat = province_dummy == 1)
regression with clustered error
didreg = feols(enroll ~ treat*as.factor(year), cluster =~ province, data = dataf)
summary(didreg)
OLS estimation, Dep. Var.: enroll
Observations: 1,147,797
Standard-errors: Clustered (province)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.696301 0.001210 575.382840 < 2.2e-16 ***
treatTRUE -0.007199 0.004052 -1.776817 1.0323e-01
as.factor(year)2011 0.002674 0.002349 1.138227 2.7921e-01
as.factor(year)2012 0.007658 0.002573 2.976554 1.2597e-02 *
as.factor(year)2013 0.006626 0.001675 3.956681 2.2468e-03 **
as.factor(year)2014 0.010876 0.001882 5.778537 1.2310e-04 ***
as.factor(year)2015 0.007739 0.002004 3.862600 2.6414e-03 **
as.factor(year)2016 0.012949 0.001560 8.298654 4.6024e-06 ***
as.factor(year)2017 0.014112 0.002943 4.794603 5.5816e-04 ***
as.factor(year)2018 0.017982 0.002331 7.713993 9.2182e-06 ***
as.factor(year)2019 0.018066 0.001410 12.812988 5.9136e-08 ***
treatTRUE:as.factor(year)2011 -0.009903 0.004360 -2.271484 4.4192e-02 *
treatTRUE:as.factor(year)2012 -0.010737 0.003663 -2.930719 1.3673e-02 *
treatTRUE:as.factor(year)2013 -0.010446 0.003502 -2.982775 1.2458e-02 *
treatTRUE:as.factor(year)2014 -0.009357 0.004575 -2.045244 6.5510e-02 .
treatTRUE:as.factor(year)2015 -0.003574 0.003624 -0.986359 3.4516e-01
treatTRUE:as.factor(year)2016 0.007486 0.007415 1.009658 3.3435e-01
treatTRUE:as.factor(year)2017 0.008532 0.005418 1.574857 1.4359e-01
treatTRUE:as.factor(year)2018 0.008883 0.003855 2.304443 4.1705e-02 *
treatTRUE:as.factor(year)2019 0.010301 0.006955 1.481036 1.6666e-01
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.456241 Adj. R2: 3.696e-4
2- event study
OLS_event <- data.frame(year = c(2011:2019),
point = didreg$coefficients[12:20],
sd.error = sqrt(diag(vcov(didreg)))[12:20]) %>%
mutate(ymin = point - sd.error,
ymax = point + sd.error)
`
plot_Q5 <- ggplot(OLS_event, aes(x = year, y = point)) +
geom_point(size = 0.5, color = "Black") +
geom_vline(xintercept = 2016) +
geom_pointrange(aes(ymin = ymin, ymax = ymax), size = 0.5, color = "Black") +
labs(y = "Treated coefficient for each year", x = "Year", title = "Coefficient view") +
theme_bw()
`
I expected to have comparable groups, since if I try to regress with:
ggplot(dataf, aes(year, enroll, color = treat)) +
stat_summary(geom = 'line') +
geom_vline(xintercept = 2016) +
theme_minimal()
I get values higher than zero, which actually reflect the mean of the 2 groups
province_dummy after_change enroll
0 0 0.7022814
1 0 0.6877263