0

I am currently trying to implement a Gaussian Process in Mathematica and am stuck with the maximization of the loglikelihood. I just tried to use the FindMaximum formula on my loglikelihood function but this does not seem to work on this function.

 gpdata = {{-1.5, -1.8}, {-1., -1.2}, {-0.75, -0.4}, {-0.4, 
0.1}, {-0.25, 0.5}, {0., 0.8}};

kernelfunction[i_, j_, h0_, h1_] := 
h0*h0*Exp[-(gpdata[[i, 1]] - gpdata[[j, 1]])^2/(2*h1^2)] + 
KroneckerDelta[i, j]*0.09;

covariancematrix[h0_, h1_] = 
ParallelTable[kernelfunction[i, j, h0, h1], {i, 1, 6}, {j, 1, 6}];

loglikelihood[h0_, h1_] := -0.5*
  gpdata[[All, 2]].LinearSolve[covariancematrix[h0, h1], 
  gpdata[[All, 2]], Method -> "Cholesky"] - 
0.5*Log[Det[covariancematrix[h0, h1]]] - 3*Log[2*Pi];

FindMaximum[loglikelihood[a, b], {{a, 1}, {b, 1.1}}, 
MaxIterations -> 500, Method -> "QuasiNewton"]

In the loglikelihood I would usually have the product of the inverse of the covariance matrix times the gpdata[[All, 2]] vector but because the covariance matrix is always positive semidefinite I wrote it this way. Also the evaluation does not stop if I use gpdata[[All, 2]].Inverse[ covariancematrix[h0, h1]].gpdata[[All, 2]]

Has anyone an idea? I am actually working on a far more complicated problem where I have 6 parameters to optimize but I already have problems with 2.

1 Answers1

1

In my experience I've seen that second-order methods fail with hyper-parameter optimization more than gradient based methods. I think this is because (most?) second-order methods rely on the function being close to a quadratic near the current estimate.

Using conjugate-gradient or even Powell's (derivative-free) conjugate direction method has proved successful in my experiments. For the two parameter case, I would suggest making a contour plot of the hyper-parameter surface for some intuition.