I have a problem using the kclass() function of the RCompAngrist package when i have missing values in my df. It's a function that is supposed to calculate a "limited information maximum likelihood" estimator with a dependent variable on the left hand side and two parts on the right hand side of the equation. The first one for endogenous variables and the second for instruments. It is based on the ivreg() function of the AER package. Below is a minimum working example which will reproduce the error.
library(magrittr)
library(devtools)
install_github(repo = "RCompAngrist",
username = "MatthieuStigler",
subdir = "RcompAngrist")
library(RcompAngrist)
a <- runif(10, 5, 90)
b <- runif(10, 4, 10)
c <- runif(10, 0, 1)
d <- runif(10, 5, 65)
e <- runif(10, 1, 2)
f <- runif(10, 1, 100)
g <- runif(10, 80, 90)
h <- c(1,12,3,5,NA,16,17,NA,9,10)
dummy <- kclass(a ~ b + c + d | d + e + f + g + h,
model = T,
data=df)
If you run this code you should get this error message from R:
Error in cbind(x_exo, z, x_endo, y) : number of rows of matrices must match (see arg 2)
It has to do with the NAs in the data frame but i can't figur out what exactly is going wrong. It works if you create the variable "h" without the NAs. However, if you omit the NAs via
df <- data.frame(a,b,c,d,e,f,g,h) %>% na.omit()
before you reestimate the model, R gives me this error message:
Error in R_Z[c(n_G, n_y), c(n_G, n_y)] : subscript out of bounds
I also don't understand why it doesn't omit the NAs by itself since the global option for na.action is na.omit. It gets even weirder though. If you delete "data=df" from the function and then rerun the model the error message switches back to
Error in cbind [...]
Why does it make any difference here if "data=df" is in the code or not? Does anyone have any ideas where the problem could lie? I don't get at all what's going wrong here.