1

I'm attempting to fit parameters to a data set using optim() in R. The objective function requires iterative root-solving for equation G so that the predicted values p brings the values for G (nested within the objective function) to 0 (or as close as 0 to possible; I use 50 iterations of the bisection method for stability).

Here is the problem: I would really prefer to include an analytical gradient for optim(), but I suspect it isn't possible for an iterated function. However, before I give up on the analytical gradient, I wanted to run this problem by everyone here and see if there might be a solution I'm overlooking. Any thoughts?

Note: before settling on the bisection method, I tried other root-solving methods, but all non-bracketing methods (Newton, etc.) seem to be unstable.

Below is a reproducible example of the problem. With the provided data set and the starting values for optim(), the algorithm converges just fine without an analytical gradient, but it doesn't perform so well for other sets of data and starting values.

#the data set includes two input variables (x1 and x2)
#the response values are k successes out of n trials
x1=c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  1, 1, 1, 1, 1.5, 1.5, 1.5, 1.5, 1.75, 1.75, 1.75, 1.75, 2, 2, 
  2, 2, 2.25, 2.25, 2.25, 2.25, 2.5, 2.5, 2.5, 2.5, 2.75, 2.75, 
  2.75, 2.75, 3, 3, 3, 3, 3.25, 3.25, 3.25, 3.25, 3.5, 3.5, 3.5, 
  3.5, 3.75, 3.75, 3.75, 3.75, 4, 4, 4, 4, 4.25, 4.25, 4.25, 4.25, 
  4.5, 4.5, 4.5, 4.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1.5, 1.5, 1.5, 1.5, 
  1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 
  1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.75, 1.75, 
  1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 
  1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 
  1.75, 1.75, 1.75, 1.75, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2.25, 2.25, 2.25, 
  2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 
  2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25, 
  2.25, 2.25, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 
  2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 
  2.5, 2.5, 2.5, 2.5, 2.5, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 
  2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 
  2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 
  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
  3, 3, 3, 3, 3, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 
  3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 
  3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.25, 3.5, 3.5, 
  3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 
  3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 
  3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 3.75, 
  3.75, 3.75, 3.75, 3.75, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
  4, 4, 4, 4, 4, 4, 4, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 
  4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 4.25, 
  4.25, 4.25, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 
  4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5)
x2=c(0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 
  0.15, 0.2, 0.2, 0.2, 0.2, 0.233, 0.233, 0.233, 0.267, 0.267, 
  0.267, 0.267, 0.3, 0.3, 0.3, 0.3, 0.333, 0.333, 0.333, 0.333, 
  0.367, 0.367, 0.367, 0.367, 0.4, 0.4, 0.4, 0.4, 0.433, 0.433, 
  0.433, 0.433, 0.467, 0.467, 0.467, 0.5, 0.5, 0.5, 0.5, 0.55, 
  0.55, 0.55, 0.55, 0.6, 0.6, 0.6, 0.6, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 
  0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 
  0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15, 
  0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 
  0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 
  0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 
  0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 
  0.4, 0.4, 0.4, 0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 
  0.55, 0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 
  0.3, 0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 
  0.433, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 
  0.1, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 
  0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.467, 0.467, 0.467, 0.467, 
  0.55, 0.55, 0.55, 0.55, 0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 
  0.233, 0.233, 0.3, 0.3, 0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 
  0.433, 0.433, 0.433, 0.433, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 
  0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 
  0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.467, 
  0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15, 0.15, 0.15, 
  0.15, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 0.3, 0.367, 0.367, 
  0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 0.5, 0.5, 0.5, 0.6, 
  0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 
  0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 
  0.4, 0.467, 0.467, 0.467, 0.467, 0.55, 0.55, 0.55, 0.55, 0.15, 
  0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 
  0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 
  0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.1, 0.1, 0.1, 0.1, 0.2, 
  0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 0.267, 0.333, 0.333, 0.333, 
  0.15, 0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 
  0.3, 0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433, 
  0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.267, 0.267, 0.267, 
  0.267, 0.333, 0.333, 0.333, 0.333, 0.4, 0.4, 0.4, 0.4, 0.15, 
  0.15, 0.15, 0.15, 0.233, 0.233, 0.233, 0.233, 0.3, 0.3, 0.3, 
  0.3, 0.367, 0.367, 0.367, 0.367, 0.433, 0.433, 0.433, 0.433)
k=c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 1, 3, 
    3, 3, 3, 3, 3, 4, 2, 5, 3, 4, 7, 8, 5, 4, 5, 5, 4, 5, 5, 5, 6, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 2, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 3, 2, 4, 1, 2, 
    3, 4, 2, 2, 4, 4, 3, 1, 2, 0, 3, 4, 5, 5, 0, 0, 0, 0, 0, 0, 1, 
    0, 0, 1, 1, 2, 1, 2, 2, 0, 3, 1, 0, 2, 4, 6, 5, 5, 4, 5, 5, 5, 
    1, 0, 0, 0, 2, 1, 0, 1, 3, 2, 1, 1, 3, 4, 3, 4, 5, 5, 5, 5, 8, 
    6, 7, 6, 6, 5, 7, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 2, 1, 1, 3, 3, 
    2, 1, 3, 6, 2, 5, 3, 3, 5, 6, 5, 5, 5, 1, 0, 1, 1, 2, 1, 1, 1, 
    3, 4, 2, 5, 5, 3, 4, 4, 6, 4, 6, 5, 6, 5, 5, 5, 5, 4, 5, 5, 0, 
    0, 0, 0, 0, 2, 0, 2, 3, 3, 3, 2, 3, 3, 1, 4, 4, 4, 4, 3, 5, 6, 
    5, 5, 5, 5, 5, 1, 4, 1, 2, 2, 3, 4, 2, 5, 5, 5, 5, 5, 4, 5, 7, 
    6, 7, 6, 5, 5, 5, 7, 5, 5, 5, 5, 5, 0, 1, 0, 0, 3, 2, 3, 3, 1, 
    2, 2, 2, 4, 2, 3, 2, 5, 5, 5, 5, 4, 6, 5, 6, 5, 5, 6, 5, 3, 5, 
    2, 4, 5, 3, 5, 5, 6, 4, 4, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 5, 
    5, 5, 5, 0, 0, 2, 0, 3, 2, 3, 2, 3, 4, 3, 4, 5, 5, 5, 5, 6, 4, 
    6, 4, 5, 7, 5, 5, 5, 6, 5, 5, 2, 3, 4, 4, 4, 4, 5, 5, 5, 6, 5, 
    5, 5, 5, 5, 4, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 1, 0, 2, 0, 
    3, 5, 2, 2, 4, 5, 4, 5, 6, 6, 4, 5, 4, 5, 4, 5, 5, 5, 5, 5, 5, 
    6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 4, 1, 4, 4, 4, 4, 4, 3, 6, 5, 
    4, 3, 5, 4, 5, 6, 6, 5, 6, 5, 4, 5, 5, 5, 6, 5, 5, 5, 11, 5, 
    12, 5, 5, 5, 5, 4, 5, 5, 5)
n=c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 
  6, 5, 6, 5, 5, 5, 5, 7, 5, 6, 8, 8, 6, 5, 6, 5, 5, 5, 5, 5, 6, 
  5, 5, 5, 5, 7, 11, 8, 7, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 
  4, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 5, 5, 4, 5, 5, 5, 6, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 6, 6, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 7, 6, 7, 6, 5, 5, 5, 5, 
  5, 5, 5, 5, 5, 6, 5, 6, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 
  8, 6, 7, 6, 6, 5, 7, 5, 5, 5, 5, 6, 5, 5, 5, 7, 7, 6, 5, 6, 5, 
  5, 5, 5, 6, 6, 4, 6, 6, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 7, 5, 5, 
  4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 4, 6, 5, 6, 5, 5, 5, 5, 4, 5, 5, 
  5, 5, 6, 6, 5, 6, 5, 4, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 
  6, 5, 5, 5, 5, 5, 5, 6, 5, 6, 7, 4, 6, 5, 5, 5, 5, 5, 5, 4, 5, 
  7, 6, 7, 6, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 5, 
  5, 5, 5, 5, 5, 4, 5, 6, 5, 5, 5, 5, 5, 7, 5, 6, 5, 5, 6, 5, 5, 
  5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 5, 5, 5, 
  5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 
  5, 6, 5, 6, 7, 5, 5, 5, 6, 5, 5, 4, 5, 5, 5, 5, 6, 5, 5, 5, 6, 
  5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 4, 5, 5, 5, 5, 5, 5, 5, 7, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
  5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 
  5, 5, 5, 5, 5, 5, 6, 6, 5, 6, 5, 5, 5, 5, 5, 6, 5, 5, 5, 11, 
  5, 12, 5, 5, 5, 5, 4, 5, 5, 5)

   #low_high contains the lower and upper bounds for the bisection method
    low_high=vector("list",2)
    low_high[["low"]]=rep(0,length(x1))
    low_high[["high"]]=rep(1,length(x1))
    low_high_list=rep(list(low_high),50)

    ll=function(theta)
    {
      names(theta)=c("b1","m1","b2","m2")
      b1=theta[["b1"]]
      m1=theta[["m1"]]
      b2=theta[["b2"]]
      m2=theta[["m2"]]

      #bisection function is used to find y which makes G=0
      bisection_function=function(prv,nxt)
      {
        low_high=prv
        #G and y are both vectors of the length of the data set (in this example, 469)
        y=(low_high[["low"]]+low_high[["high"]])/2
        G=-1+(x1/((log(-y/(y-1))-b1)/m1))+(x2/((log(-y/(y-1))-b2)/m2))
        low_high[["low"]][G>0]=y[G>0]
        low_high[["high"]][G<0]=y[G<0]
        return(low_high)
      }

      #Reduce is the fastest method I've found so far
      #(in other words, if there is a better way, I'm certainly open to suggestions!)
      low_high=Reduce(bisection_function,low_high_list)
      p=(low_high[["low"]]+low_high[["high"]])/2

      #sum of log likelihood for binomial distribution
      loglik=sum(log((gamma(1+n)/(gamma(1+k)*(gamma(1+n-k))))*((p^k)*((1-p)^(n-k)))))
      return(loglik)
    }

    theta.start=c(b1=-10,m1=10,b2=-10,m2=10)
    mle=optim(theta.start,ll,control=list(fnscale=-1),hessian=TRUE)

Thanks!!

Eric
  • 166
  • 5
  • That is not a programming question: your function `y=y(b1,m1,b2,m2)` is defined implicitly by `G(y,b1,m1,b2,m2)=0`; you can compute its derivative as in this [wikipedia article](https://en.wikipedia.org/wiki/Implicit_function_theorem#Regularity). – Vincent Zoonekynd Sep 25 '13 at 08:07
  • Sorry, I can't seem to grasp the application of this solution. Would the gradient function just consist of partial derivatives of `G(y,b1,m1,b2,m2)` with respect to `b1, m1, b2, m2`, or is it somehow incorporated into the `loglik` equation? Also, would the gradient function require iteration via bisection method as well? – Eric Sep 25 '13 at 18:49
  • The gradient of `y` can be computed from the partial derivatives of `G`, and that does not require any programming (no iterations): it is actually much easier to compute the derivatives of `y` than to compute `y` itself. Once you know the gradient of `y`, you can use it to compute the gradient of the `loglik` function. (But you still need the actual value of `y`: it will appear in the gradient.) – Vincent Zoonekynd Sep 25 '13 at 21:04
  • Excellent! The analytic gradient matches numeric gradient approximations (I'll post an answer after I write reproducible code for it). Before I do, one issue I'm curious about: my analytic Hessian (which I am pretty sure is correct) does not match numeric Hessian approximation values. They both give similar standard error estimates, but the raw matrix values look very different (different signs, different orders of magnitude etc). Is it feasible that an analytic Hessian would be different from the optimizer's numeric approximation, or should I assume that my Hessian calculations are incorrect? – Eric Sep 29 '13 at 20:06
  • Since `y` is computed algorithmically, it can be imprecise, and each differentiation amplifies those imprecisions: this could be the cause of the difference. In this case, the numeric Hessian is probably wrong. However, the numeric derivative of your analytic gradient should match your analytic Hessian. – Vincent Zoonekynd Sep 29 '13 at 21:02

1 Answers1

1

Using Vincent's suggestions, I was able to supply an analytic gradient via implicit differentiation. In case anyone else has a similar problem, I have included reproducible code below (to be added after the code included in the question).

Gexpression=parse(text="-1+(x1/((log(-p/(p-1))-b1)/m1))+(x2/((log(-p/(p-1))-b2)/m2))")

nested=function(theta)
{
  names(theta)=c("b1","m1","b2","m2")
  b1=theta[["b1"]]
  m1=theta[["m1"]]
  b2=theta[["b2"]]
  m2=theta[["m2"]]

  #bisection function is used to find y which makes G=0
  bisection_function=function(prv,nxt)
  {
    low_high=prv
    #G and y are both vectors of the length of the data set (in this example, 469)
    y=(low_high[["low"]]+low_high[["high"]])/2
    G=-1+(x1/((log(-y/(y-1))-b1)/m1))+(x2/((log(-y/(y-1))-b2)/m2))
    low_high[["low"]][G>0]=y[G>0]
    low_high[["high"]][G<0]=y[G<0]
    return(low_high)
  }

  low_high=Reduce(bisection_function,low_high_list)
  p=(low_high[["low"]]+low_high[["high"]])/2
  return(p)
}

gr=function(theta)
{
  names(theta)=c("b1","m1","b2","m2")
  b1=theta[["b1"]]
  m1=theta[["m1"]]
  b2=theta[["b2"]]
  m2=theta[["m2"]]
  p=nested(theta)

  # dll is the derivative of the loglik function, which takes the partial derivative
  #   of any parameter
  dll=function(d_any) (((k / p) * d_any) - (((n - k) / (1 - p))*d_any))

  #fd_any takes the partial derivative of the with respect to any parameter
  fd_any=function(any) eval(parse(text=paste("-((",as.character(list(D(Gexpression,any))),")/(",as.character(list(D(Gexpression,'p'))),"))",sep="")))

  DLb1=dll(fd_any("b1"))
  DLb2=dll(fd_any("b2"))
  DLm1=dll(fd_any("m1"))
  DLm2=dll(fd_any("m2"))

  DLb1[is.na(DLb1)]=0
  DLb2[is.na(DLb2)]=0
  DLm1[is.na(DLm1)]=0
  DLm2[is.na(DLm2)]=0

  colSums(cbind(b1=DLb1,m1=DLm1,b2=DLb2,m2=DLm2))
}

hs=function(theta)
{
  names(theta)=c("b1","m1","b2","m2")
  b1=theta[["b1"]]
  m1=theta[["m1"]]
  b2=theta[["b2"]]
  m2=theta[["m2"]]
  p=nested(theta)
  fd_any_fun=function(any) paste("(-((",as.character(list(D(Gexpression,any))),")/(",as.character(list(D(Gexpression,'p'))),")))",sep="")
  dll_fun=function(d_any_fun) paste("((k / p) * (",d_any_fun,")) - (((n - k) / (1 - p))*(",d_any_fun,"))",sep="")

  hb1b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"b1")))
  hb1m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"m1")))
  hb1b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"b2")))
  hb1m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b1"))),"m2")))

  hb1b1[is.na(hb1b1)]=0
  hb1m1[is.na(hb1m1)]=0
  hb1b2[is.na(hb1b2)]=0
  hb1m2[is.na(hb1m2)]=0

  hb1b1=sum(hb1b1)
  hb1m1=sum(hb1m1)
  hb1b2=sum(hb1b2)
  hb1m2=sum(hb1m2)

  h1=c(hb1b1,hb1m1,hb1b2,hb1m2)

  hm1b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"b1")))
  hm1m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"m1")))
  hm1b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"b2")))
  hm1m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m1"))),"m2")))

  hm1b1[is.na(hm1b1)]=0
  hm1m1[is.na(hm1m1)]=0
  hm1b2[is.na(hm1b2)]=0
  hm1m2[is.na(hm1m2)]=0

  hm1b1=sum(hm1b1)
  hm1m1=sum(hm1m1)
  hm1b2=sum(hm1b2)
  hm1m2=sum(hm1m2)

  h2=c(hm1b1,hm1m1,hm1b2,hm1m2)

  hb2b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"b1")))
  hb2m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"m1")))
  hb2b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"b2")))
  hb2m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("b2"))),"m2")))

  hb2b1[is.na(hb2b1)]=0
  hb2m1[is.na(hb2m1)]=0
  hb2b2[is.na(hb2b2)]=0
  hb2m2[is.na(hb2m2)]=0

  hb2b1=sum(hb2b1)
  hb2m1=sum(hb2m1)
  hb2b2=sum(hb2b2)
  hb2m2=sum(hb2m2)

  h3=c(hb2b1,hb2m1,hb2b2,hb2m2)

  hm2b1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"b1")))
  hm2m1=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"m1")))
  hm2b2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"b2")))
  hm2m2=eval(parse(text=D(parse(text=dll_fun(fd_any_fun("m2"))),"m2")))

  hm2b1[is.na(hm2b1)]=0
  hm2m1[is.na(hm2m1)]=0
  hm2b2[is.na(hm2b2)]=0
  hm2m2[is.na(hm2m2)]=0

  hm2b1=sum(hm2b1)
  hm2m1=sum(hm2m1)
  hm2b2=sum(hm2b2)
  hm2m2=sum(hm2m2)

  h4=c(hm2b1,hm2m1,hm2b2,hm2m2)

  h=rbind(h1,h2,h3,h4)
  return(h)
}

The gradient seems to work fine. For some reason, the estimated Hessian matrix from optim() is different than the gradient calculated in hs(). The resulting standard errors are of the same order of magnitude, at least:

# Standard errors from optim Hessian
sqrt(abs(diag(solve(mle$hessian))))
# Standard errors from analytic Hessian
sqrt(abs(diag(solve(hs(mle$par)))))
Eric
  • 166
  • 5