2

I'm an R noob which might be reflected in the not so dense code - so please bear. I'm trying to estimate coefficients for a bivariate normal distribution using max. likelihood estimation. I receive errors related to the Hessian when calling the OPTIM function. I've tried debugging quite a bit but seem unable to get rid of the error. Greatly appreciate any insights you might have on how to tackle this.

The data I'm using is {y1,y2,x1,x2} where y1,y2 are binary variables. The code I use to simulate the data is below:

x1=rnorm(1000)*2+3
x2=rnorm(1000)-0.5

mu=c(0,0)
sigma=array(c(1,0.5,0.5,1),c(2,2)) # correlation matrix
e=mvrnorm(n = 1000, mu, sigma) #MASS package


z1=1+0.5*x1+x2+e[,1]
y1=1*(z1>=0)

z2=0.8+0.3*x1+1.2*x2+e[,2]
y2=1*(z2>=0)

The parameters I'm trying to estimate are the betas in the latent utility functions z1 and z2, and the off-diagonal elements in the variance-covariance matrix.

Thanks!

I first specify the errors and then provide the code after the errors:

First the Errors which seems to originate at this line within the code:

mle = optim(theta.start,logl,x=x,y1=y1,y2=y2,hessian=T)  #Error@Here.  

A)If I set the hessian = F in the parameters in call to OPTIM, I get the following error and traceback:

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL' 

6 array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), 
    NULL) else NULL) 
5 as.matrix.default(a) 
4 as.matrix(a) 
3 solve.default(mle$hessian) 
2 solve(mle$hessian) 
1 mle.reg(fmla, bvprobitdata) 

B) If I set the hessian = T in the parameters in call to OPTIM, I get the following error and traceback:

Error in solve.default(mle$hessian) : 
  Lapack routine dgesv: system is exactly singular: U[1,1] = 0

3 solve.default(mle$hessian) 
2 solve(mle$hessian) 
1 mle.reg(fmla, bvprobitdata)

Now the code:

 # MLE Estimation of Bivariate Normal with correlation.

    require(Formula)
require(pbivnorm)
#Get probit data
bvprobitdata <- read.csv("/Users/...../yhbi_probitdata.csv", header = TRUE)
head(bvprobitdata,10)

#Bivariate Normal Estimation using MLE

mle.reg = function(fmla,data) {

   # Define the negative log likelihood function
    logl <- function(theta,x,y1,y2){
    y1 <- y1
    y2 <- y2
    x <- x

    #Id <- rep(1,1000)
    #x <- as.matrix(cbind(Id,x1,x2))
    beta1 <- matrix(theta[1:3],3,1)
    beta2 <- matrix(theta[4:6],3,1)
    ro <- theta[7]

    # Calculate CDFs
    temp1 <- as.matrix(cbind((x%*%beta1),(x%*%beta2))) # Create a matrix of the two cross products
    bvCDF <- pbivnorm(temp1,rho=ro) # Bivariate CDF
    xb1CDF <- pnorm(x%*%beta1)   
    Negxb1CDF <- pnorm(-(x%*%beta1))

    # Calculate Log Likelihood  - Temporarily commented out to focus debugging error in Hessian in OPTIM.
    #llik <- y1*y2*bvCDF + y1(1-y2)*log(xb1CDF-bvCDF) + (1-y1)*log(Negxb1CDF) #Calc log likelihood    
    #loglik <- sum(llik) # Sum up the log likelihoods for each observation.    
    #return(-loglik) # -ve Since OPTIM minimizes and we want to maximize loglikelihood.

    return(100)
    }


  # Prepare the data
  fml <- model.frame(fmla, data =data)
  fml
  outcome1 = rownames(attr(terms(fmla),"factors"))[1]
  outcome2 = rownames(attr(terms(fmla),"factors"))[2]
  head(data,10)
  print(outcome2)
  dfrTmp = model.frame(data)

  y1 = as.numeric(as.matrix(data[,match(outcome1,colnames(data))]))
  y2 = as.numeric(as.matrix(data[,match(outcome2,colnames(data))]))
  x = as.matrix(model.matrix(fmla, data=dfrTmp))

  # Define initial values for the parameters
  theta.start = cbind(1,1,1,1,1,1,0.5)
  # Assign names to the parameters
  names(theta.start)[1] = "b10"
  names(theta.start)[2] = "b11"
  names(theta.start)[3] = "b12"
  names(theta.start)[4] = "b20"
  names(theta.start)[5] = "b21"
  names(theta.start)[6] = "b22"
  names(theta.start)[7] = "ro"


  # Calculate the maximum likelihood
  mle = optim(theta.start,logl,x=x,y1=y1,y2=y2,hessian=T)  #Error@Here.  
  out = list(beta=mle$par,vcov=solve(mle$hessian),ll=2*mle$value)
}

print("before call")
fmla <- Formula(y1 | y2 ~x1+x2) #Create model formula
mlebvprobit = mle.reg(fmla,bvprobitdata) #Estimate coefficients for probit 
print("after call")
mlebvprobit
Amir
  • 10,600
  • 9
  • 48
  • 75
imfundo
  • 43
  • 4

2 Answers2

0

Have you tried profiling the likelihood? Typically this happens when there is not enough data for the number of parameters, so the hessian cannot be inverted or cholesky transformed.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

This is a bit delayed but you want to take the square root of the diagonal of the negative hessian as your fisher information matrix will give you the SSE of the model.

sqrt(solve(-mle$hessian))
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187