IV for logistic regression with clustered standard errors in R

Question

I have individual-level data to analyze the effect of state-level educational expenditures on individual-level students' performances. Students' performance is a binary variable (0 when they do not pass, 1 when they pass the test). I run the following glm model with state-level clustering of standard errors:

library(miceadds)
df_logit <- data.frame(performance = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0),
                       state = c("MA", "MA", "MB", "MC", "MB", "MD", "MA", "MC", "MB", "MD", "MB", "MC", "MA", "MA", "MA", "MA", "MD", "MA","MB","MA","MA","MD","MC","MA","MA","MC","MB","MB","MD", "MB"),
                       expenditure = c(123000, 123000,654000, 785000, 654000, 468000, 123000,  785000, 654000, 468000, 654000, 785000,123000,123000,123000,123000, 468000,123000, 654000, 123000, 123000, 468000,785000,123000, 123000, 785000, 654000, 654000, 468000,654000),
                       population = c(0.25, 0.25, 0.12, 0.45, 0.12, 0.31, 0.25, 0.45, 0.12, 0.31, 0.12, 0.45, 0.25, 0.25, 0.25, 0.25, 0.31, 0.25, 0.12, 0.25, 0.25, 0.31, 0.45, 0.25, 0.25, 0.45, 0.12, 0.12, 0.31, 0.1),
                       left_wing = c(0.10, 0.10, 0.12, 0.18, 0.12, 0.36, 0.10, 0.18, 0.12, 0.36, 0.12, 0.18, 0.10, 0.10, 0.10, 0.10, 0.36, 0.10, 0.12, 0.10, 0.10, 0.36, 0.18, 0.10, 0.10,0.18, 0.12, 0.12, 0.36, 0.12))


df_logit$performance <- as.factor(df_logit$performance)
                       
glm_clust_1 <- miceadds::glm.cluster(data=df_logit, formula=performance ~ expenditure + population,
                                                            cluster="state", family=binomial(link = "logit")) 
summary(glm_clust_1)

Since I cannot rule out the possibility that expenditures are endogenous, I would like to use the share of left-wing parties at the state level as an instrument for education expenditures.

However, I have not found a command in ivtools or other packages to run two-stage least squares with control variables in a logistic regression with state-level clustered standard errors.

Which commands can I use to extend my logit model with the instrument "left_wing" (also included in the example dataset) and at the same time output the common tests like the Wu-Hausman test or the weak instrument test (like ivreg does for ols)?

ideally, I could adapt the following command to binary dependent variables and cluster the standard errors at state level

iv_1 <- ivreg(performance ~ population + expenditure | left_wing + population, data=df_logit)
summary(iv_1, cluster="state", diagnostics = TRUE)

When you say instrument, do you mean the effect of % of left-wing parties control at the state level on education expenditures? (i.e. `expenditure:left_wing`) Or in other words the interaction effect of expenditure and left_wing? — Hansel Palencia, Oct 03 '22 at 09:23
So to get just the interaction you can do `expenditure:left_wing` or if you want to include both the primary and interaction you can do `expenditure*left_wing` inside of your glm.cluster() model `glm_clust_1 <- miceadds::glm.cluster(data=df_logit, formula=performance ~ expenditure*left_wing + population, cluster="state", family=binomial(link = "logit"))` — Hansel Palencia, Oct 03 '22 at 09:25
There is also this bookdown document around causal analysis that has a list of packages that [fit instrumental variable regression by two-stage least squares](https://bookdown.org/paul/applied-causal-analysis/packages-functions-2.html) — Hansel Palencia, Oct 03 '22 at 09:35
@Hansel I edited my post to make clear that I want to instrument the expenditure by the share of left_wing representatives in the respective state (take the exogenous part of educational expenditure) insted of including an interaction effect. — R-User, Oct 03 '22 at 09:35
There is also this bookdown document around causal analysis that has a list of packages that [fit instrumental variable regression by two-stage least squares](https://bookdown.org/paul/applied-causal-analysis/packages-functions-2.html) I would check out [ClusterSE package] (chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://cran.r-project.org/web/packages/clusterSEs/clusterSEs.pdf), look at the function `cluster.bs.mlogit` specifically. There is an example there that uses IV regression on a clustered variable. (Example 2) — Hansel Palencia, Oct 03 '22 at 09:42
There is also this book [econometrics with r Chapter 12.1](https://www.econometrics-with-r.org/12.1-TIVEWASRAASI.html) which shows how to implement IV regression in R. In case you wanted a really simple explanation before jumping into the `clusterSEs` package — Hansel Palencia, Oct 03 '22 at 09:54

score 1 · Answer 1 · answered Oct 03 '22 at 10:33

Try this?

require(mlogit)
require(ivprobit)

test <- ivprobit(performance ~ population | expenditure | left_wing + population, data = df_logit)

summary(test)

I wasn't completely sure about the clustering part, but according to this thread on CrossValidated, it might not be necessary. Please take a read and let me know what you think.

Essentially, what I understood was since the likelihood of binary data is already specified there is no need to include the clusters. This is only true when your model is "correct", however, if you believe that there is something in the joint distribution that is not accounted for then you should cluster, though from my reading it doesn't seem like it's possible to implement clustering on a IV logit model in R.

In terms of the model itself there is a really good explanation in this SO question. How can I use the "ivprobit" function in "ivprobit" package in R?.

From my reading as well there should be almost no difference between the end results of a logit v probit model.

The basic breakdown is as follows:

y= d2 = dichotomous l.h.s.
x= ltass+roe+div = the r.h.s. exogenous variables
y1= eqrat+bonus = the r.h.s. endogenous variables
x2= tass+roe+div+gap+cfa = the complete set of instruments

Feel free to comment/edit/give feedback to this answer as I'm definitely not expert in applications of causal analysis and it's been a long time since I've implemented one. I also have not explored the potential of post-hoc tests from this final model, so that is still left for completion.

Thanks a lot. That helped. But two problems arose: First, after declaring the variable "performance" a "factor" instead of a "numeric variable" (which needs to be done as it is a binary instead of a continous variable), I get the following error message when running the ivprobit: "Error in weights * y : non-numeric argument to binary operator" Second, I do not get the diagnostics in the summary command (weak instrument and Wu-Hausman test) — R-User, Oct 03 '22 at 12:57
Yes, I was doing some testing and it seems as though the `probit` model does not produce these tests like the `ivreg()` model would.... definitely a weakness. There is potential though that this is also covered by the `probit` methodology. (i.e. Wu-Hausman is a test of model misspecification, but the idea of probit is that you don't have model misspecification to begin with, therefore it is not needed) — Hansel Palencia, Oct 03 '22 at 15:29

IV for logistic regression with clustered standard errors in R

1 Answers1