I'm an R noob which might be reflected in the not so dense code - so please bear. I'm trying to estimate coefficients for a bivariate normal distribution using max. likelihood estimation. I receive errors related to the Hessian when calling the OPTIM function. I've tried debugging quite a bit but seem unable to get rid of the error. Greatly appreciate any insights you might have on how to tackle this.
The data I'm using is {y1,y2,x1,x2} where y1,y2 are binary variables. The code I use to simulate the data is below:
x1=rnorm(1000)*2+3
x2=rnorm(1000)-0.5
mu=c(0,0)
sigma=array(c(1,0.5,0.5,1),c(2,2)) # correlation matrix
e=mvrnorm(n = 1000, mu, sigma) #MASS package
z1=1+0.5*x1+x2+e[,1]
y1=1*(z1>=0)
z2=0.8+0.3*x1+1.2*x2+e[,2]
y2=1*(z2>=0)
The parameters I'm trying to estimate are the betas in the latent utility functions z1 and z2, and the off-diagonal elements in the variance-covariance matrix.
Thanks!
I first specify the errors and then provide the code after the errors:
First the Errors which seems to originate at this line within the code:
mle = optim(theta.start,logl,x=x,y1=y1,y2=y2,hessian=T) #Error@Here.
A)If I set the hessian = F in the parameters in call to OPTIM, I get the following error and traceback:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
6 array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),
NULL) else NULL)
5 as.matrix.default(a)
4 as.matrix(a)
3 solve.default(mle$hessian)
2 solve(mle$hessian)
1 mle.reg(fmla, bvprobitdata)
B) If I set the hessian = T in the parameters in call to OPTIM, I get the following error and traceback:
Error in solve.default(mle$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
3 solve.default(mle$hessian)
2 solve(mle$hessian)
1 mle.reg(fmla, bvprobitdata)
Now the code:
# MLE Estimation of Bivariate Normal with correlation.
require(Formula)
require(pbivnorm)
#Get probit data
bvprobitdata <- read.csv("/Users/...../yhbi_probitdata.csv", header = TRUE)
head(bvprobitdata,10)
#Bivariate Normal Estimation using MLE
mle.reg = function(fmla,data) {
# Define the negative log likelihood function
logl <- function(theta,x,y1,y2){
y1 <- y1
y2 <- y2
x <- x
#Id <- rep(1,1000)
#x <- as.matrix(cbind(Id,x1,x2))
beta1 <- matrix(theta[1:3],3,1)
beta2 <- matrix(theta[4:6],3,1)
ro <- theta[7]
# Calculate CDFs
temp1 <- as.matrix(cbind((x%*%beta1),(x%*%beta2))) # Create a matrix of the two cross products
bvCDF <- pbivnorm(temp1,rho=ro) # Bivariate CDF
xb1CDF <- pnorm(x%*%beta1)
Negxb1CDF <- pnorm(-(x%*%beta1))
# Calculate Log Likelihood - Temporarily commented out to focus debugging error in Hessian in OPTIM.
#llik <- y1*y2*bvCDF + y1(1-y2)*log(xb1CDF-bvCDF) + (1-y1)*log(Negxb1CDF) #Calc log likelihood
#loglik <- sum(llik) # Sum up the log likelihoods for each observation.
#return(-loglik) # -ve Since OPTIM minimizes and we want to maximize loglikelihood.
return(100)
}
# Prepare the data
fml <- model.frame(fmla, data =data)
fml
outcome1 = rownames(attr(terms(fmla),"factors"))[1]
outcome2 = rownames(attr(terms(fmla),"factors"))[2]
head(data,10)
print(outcome2)
dfrTmp = model.frame(data)
y1 = as.numeric(as.matrix(data[,match(outcome1,colnames(data))]))
y2 = as.numeric(as.matrix(data[,match(outcome2,colnames(data))]))
x = as.matrix(model.matrix(fmla, data=dfrTmp))
# Define initial values for the parameters
theta.start = cbind(1,1,1,1,1,1,0.5)
# Assign names to the parameters
names(theta.start)[1] = "b10"
names(theta.start)[2] = "b11"
names(theta.start)[3] = "b12"
names(theta.start)[4] = "b20"
names(theta.start)[5] = "b21"
names(theta.start)[6] = "b22"
names(theta.start)[7] = "ro"
# Calculate the maximum likelihood
mle = optim(theta.start,logl,x=x,y1=y1,y2=y2,hessian=T) #Error@Here.
out = list(beta=mle$par,vcov=solve(mle$hessian),ll=2*mle$value)
}
print("before call")
fmla <- Formula(y1 | y2 ~x1+x2) #Create model formula
mlebvprobit = mle.reg(fmla,bvprobitdata) #Estimate coefficients for probit
print("after call")
mlebvprobit