0

I'm trying to optimize the marginal likelihood to estimate parameters for a Gaussian process regression. So i defined the marginal log likelihood this way:

def marglike(par,X,Y):
l,sigma_n = par
n = len(X)
dist_X = (X.T - X)**2
k = np.exp(-(1/(2*(l**2)))*dist_X)
inverse = np.linalg.inv(k + (sigma_n**2)*np.eye(len(k))) 
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k + (sigma_n**2)*np.eye(len(k)))) + (n/2)*np.log(2*np.pi)
return ml

Where the parameter to be optimized are " l " and " sigma_n ". With some initial values and data, the function give some value back:

X = np.linspace(1,10,20)
F = np.sin(X)
start = np.array([1,0.05]) #initial parameters values

marglike(start,X,F)

marglike(start,X,F)
Out[75]: array([[1872.6511786]])

But when i try to optimize the parameters with " minimize ", i get this:

re = minimize(marglike,start,args=(X,F),method="BFGS",options = {'disp':True})

re = minimize(marglike,start,args=(X,F),method="BFGS",options = {'disp':True})
Optimization terminated successfully.
     Current function value: 22.863446
     Iterations: 8
     Function evaluations: 60
     Gradient evaluations: 15

re.x
Out[89]: array([1.        , 0.70845989])

I dont know why but the parameter "l" don't seem to be optimized, but it matches the starting value that i fixed.

Any suggest ?

1 Answers1

0

You need to reshape X to 2d first for X.T-X to work. Also, you need to add one more parameter called variance (var in the code below) in optimization. Let me know if the below code solves your problem.

from scipy.optimize import minimize

def marglike(par,X,Y):
  # print(par)
  l,var,sigma_n = par
  n = len(X)
  dist_X = (X - X.T)**2
  # print(dist_X)
  k = var*np.exp(-(1/(2*(l**2)))*dist_X)
  inverse = np.linalg.inv(k + (sigma_n**2)*np.eye(len(k))) 
  ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k + (sigma_n**2)*np.eye(len(k)))) + (n/2)*np.log(2*np.pi)
  return ml

X = np.linspace(1,10,20).reshape(-1,1) # Reshaping
F = np.sin(X)
start = np.array([1.1,1.6,0.05]) #initial parameters values
print(marglike(start,X,F))

re = minimize(marglike,start,args=(X,F),method="L-BFGS-B",options = {'disp':True})
re.x
Zeel B Patel
  • 691
  • 5
  • 16
  • Hi, first of all thanks for answering. The code seems to work this way. Regarding the new parameter "var", i avoided to put it in since it is possible to calculate it by closed form (Surrogates by Robert B. Gramacy). Once i put in in the formula, shouldn't i square it ? (as done for "l" and "sigma_n") – Pasquale Russo Feb 14 '21 at 09:56
  • And also, the result for sigma_n is a negative value, and since it is a variance, how this is possible ? – Pasquale Russo Feb 14 '21 at 10:21
  • I did not know if it's possible to calculate var in closed form. Thanks for letting me know, can you guide me to the resource where I can know how to do that exactly. If we want to interpret var as variance, no need to square, but if as a standard deviation (sigma) we need to square it. Now, about sigma_n being negative, we need to set the bound in minimize method to avoid this. Generally, I have seen bounds of (10^-5, 10^5) but setting it positive should also work – Zeel B Patel Feb 15 '21 at 09:51
  • Ah ok i think i got it, so for the same logic i should remove the squared form for sigma_n and l ?! Regarding the closed form, you can check this book: Surrogates Gaussian Process Modeling, Design, and Optimization for the Applied Sciences by Robert B. Gramacy If you can't find it, i can provide it for you! – Pasquale Russo Feb 15 '21 at 17:07
  • No, we should not remove the square, if we want to sustain how sigma_n (noise), l (lengthscale), and sigma_f (standard deviation) are interpreted in the community. Though I am not sure, I think square also helps in gradient calculation in optimization. I would say refer to the well-known library GPy and dive into source code. Thanks a lot for the book. It looks amazing for applied sciences !! – Zeel B Patel Feb 16 '21 at 02:42