0

I'm having an issue using scipy's minimize() function, and I don't really understand enough about optimization to grasp what is wrong here..

I have a function that calls scipy.optimize.minimize(). It works fine and provides me with exactly the outputs that I need when x0 is an array of size > 1, but when x0 is exactly 1 it fails. The documentation says that x0 must be an np.ndarray of size (n,), but doesn't specify that it should be > 1 so I assumed it would be ok. A smaller version of my code calling the function with the optimal value:

import numpy as np
from scipy.optimize import minimize

def to_freq(*arrays):
    # Better version of `convert_to_freq()`
    out = []
    for a in arrays:
        converted = np.array([(x + i / len(a)) / (max(a)+1) for i, x in enumerate(a, start=1)])
        out.append(converted)
    return out

def likelihood(x, x_freq, expected, x_max):
    # Better version, supports vectorisation
    a = 2 * x * np.log(x_freq / expected) 
    b = 2 * (x_max - x) * np.log((1 - x_freq) / (1 - expected))
    return a + b

def objective(x0, labels, a, b):
    R = x0[labels=='R'].item()

    a_c, b_c = np.cumsum(a), np.cumsum(b)
    a_f, b_f = to_freq(a_c, b_c)

    # Get the expected values for signals and noises
    exp_a = ((1 - R) * b_f + R)[:-1]
    exp_b = b_f[:-1]

    # Compute the gsquared using the dual process model parameters
    #   Still getting runtime warnings about division. Function only works with numpy, so can't use math.
    a_lrat = likelihood(x=a_c[:-1], x_freq=a_f[:-1], expected=exp_a, x_max=a_c.max())
    b_lrat = likelihood(x=b_c[:-1], x_freq=b_f[:-1], expected=exp_b, x_max=b_c.max())

    return sum(a_lrat + b_lrat)

# Observations
a = [508,224,172,135,119,63]
b = [102,161,288,472,492,308]
x0 = np.array([0.520274590415736]) # Optimal value for variable
labels = np.array(['R'])

# Gives correct iotimized value of 163.27525607890783
objective(x0, labels, a, b)

And now randomly initializing x0 for cases when the optimal value is unknown:

x0 = np.random.uniform(-.5,0.5, len(labels)) # random initialization

# Without method='nelder-mead' occasionally gives correct value of fun, but frequently fails
opt = minimize(fun=objective, x0=x0, args=(labels, a, b), tol=1e-4)
print(opt)

The failed optimization result is this:

      fun: nan
 hess_inv: array([[1]])
      jac: array([nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 336
      nit: 1
     njev: 112
   status: 2
  success: False
        x: array([1034.74])

But if I keep running this and randomly setting the initial value, it occasionally spits out a good result:

      fun: 163.27525607888913
 hess_inv: array([[4.14149525e-05]])
      jac: array([-1.90734863e-05])
  message: 'Optimization terminated successfully.'
     nfev: 27
      nit: 7
     njev: 9
   status: 0
  success: True
        x: array([0.52027462])

If I specify method='nelder-mead' (a solution to a possibly unrelated problem) in the minimize() call within my bigger function, it also actually provides me with the expected result:

 final_simplex: (array([[0.52026029],
       [0.52031204]]), array([163.27525856, 163.27527298]))
           fun: 163.2752585612531
       message: 'Optimization terminated successfully.'
          nfev: 32
           nit: 16
        status: 0
       success: True
             x: array([0.52026029])

I don't really understand what the best approach would be for implementing this since I am very inexperienced with optimization.

[Footnote]: The minimization algorithm sometimes tries values that are incompatible with my function (e.g. < 0 or > 1) and a call to np.log() ends up throwing a warning, but I'm normally just suppressing this since it seems to be working regardless...

fffrost
  • 1,659
  • 1
  • 21
  • 36
  • What you write in your footnote is generally not a good idea. When you use the optimisation routines you generally guarantee that variables you optimise can take any value from -inf to +inf. There are algorithms which let you define bounds, but this needs to be provided explicitly and only works with a subset of optimization algorithms. Often it is easy to reparametrize to fulfill the condition. E.g. if a function f(x) is only valid for values > 0, you can use y := exp(x) and f(y) will be valid for all y on the real axis. – cel May 05 '20 at 11:10
  • I'm not quite sure how I could implement that - the situation occurs during the call to `likelihood()`, where the `np.log()` function is used – fffrost May 05 '20 at 11:30
  • The Nelder-Mead method does not require computing function gradients, which my be why it works in your case. I find Nelder-Mead good in many cases on real data. If you can compute the gradient of your function then algorithms requiring gradient descent will work better. There is a nice explanation of these algorithms here: https://scipy-lectures.org/advanced/mathematical_optimization/ – Paddy Harrison May 05 '20 at 11:39

0 Answers0