Why are the optimized solutions different with scipy.optimize.minimize with small variations in initial guesses?

Question

I am working on a detailed code that requires optimization, which I have simplified for the MWE. I am trying to find the optimal value of arg_opt that minimizes the value that is obtained from a different function.

I believe it is a simple error or my understanding is wrong. But wouldn't the final optimized solution be independent of the initial guess (for small variations as in this case). For this MWE, I get the same minimized value, but the final value of x is different. I would have expected only minor differences, what is the source of this discrepancy?

MWE

import numpy as np
from scipy import optimize

def fn_cubic(arg_1, arg_2, arg_3, data):
    return (arg_1 ** 3 + arg_2 ** 2 + arg_3 + np.sum(np.exp(data))) / 100

arg_opt_1 = np.ones(shape=(3)) * 2
arg_opt_2 = np.ones(shape=(3)) * 3

data_original = [1, 5, 4, 10, 3, 9, 6, 3]
data = np.zeros(shape=len(data_original))
pos_data = np.array([1, 3, 2])

def function_to_optimize(arg_opt, arg_1, arg_2, arg_3):
    for x, y in enumerate(arg_opt):
        data[pos_data[x]] = data_original[pos_data[x]] * y
    value = fn_cubic(arg_1, arg_2, arg_3, data)
    return value

opt_sol_1 = optimize.minimize(function_to_optimize, arg_opt_1, args=(0.1, 0.2, 0.3))
opt_sol_2 = optimize.minimize(function_to_optimize, arg_opt_2, args=(0.1, 0.2, 0.3))


print(' 1:', opt_sol_1.x, '\n','2:', opt_sol_2.x)

Output

 1: [-1.10240891e+03 -9.28714306e-01 -1.17584215e+02] 
 2: [-1.98936327e+03 -9.68415948e-01 -1.53438039e+03]

Why are you using such a convoluted function example? If you're just passing 0.1, 0.2, 0.3 as constants, you don't need those in your "minimal" example. — BrenBarn, Feb 02 '22 at 05:13
Wrt `arg_opt_1` & `arg_opt_2` being 2 & 3, which is the only difference: 3 is 50% more than 2, and 2 is 33% less than 3. It's being used in the context of numbers of the same magnitude [1, 5, 4, 10, 3, 9, 6, 3]. You're then using your cubic function to make the differences even larger - similar to what's done in pseudo-random number generators to make small differences in inputs result in vastly different outputs. — aneroid, Feb 02 '22 at 05:28
@aneroid: The cubic function is a red herring. The values being put into the cubic function are just the constants 0.1, 0.2, and 0.3 passed as extra args to the optimizer. Only the `data` value (which is calculated from `arg_opt`) actually affects the result. — BrenBarn, Feb 02 '22 at 05:39
@BrenBarn Thanks! yeah, I re-see that now, wrt the cubic function. It's really just `np.sum(np.exp(data))` affecting the output. — aneroid, Feb 02 '22 at 05:47
@aneroid: Yes, the question has a lot of extraneous stuff that makes it hard to see what the function being optimized actually is. — BrenBarn, Feb 02 '22 at 05:47

score 1 · Answer 1 · answered Feb 02 '22 at 05:35

There's no particular guarantee about the relationship between the initial guess and the point found by the optimizer. You can even get different x values by giving the same initial guess and using different solver methods.

One thing to keep in mind is that the function you're choosing to optimize is kind of weird. The only way it uses the "data" is to exponentiate it and sum it. This means that it is "collapsing" a lot of potential variation in the argument. (For instance, permuting the values of data will not change the value of the objective function.)

In other words, there are many different "data" values that will give the same result, so it's not surprising that the solver sometimes finds different ones.

Why are the optimized solutions different with scipy.optimize.minimize with small variations in initial guesses?

1 Answers1