I am using scipy.optimize.leastsq to attempt to fit a number of parameters to real-world data in the presence of noise. The objective function occasionally gets called with NaNs from within minpack. Is this the expected behavior of scipy.optimize.leastsq? Is there any better option than just returning NaN residuals under this condition?
The following code demonstrates the behavior:
import scipy.optimize
import numpy as np
xF = np.array([1.0, 2.0, 3.0, 4.0]) # Target value for fit
NOISE_LEVEL = 1e-6 # The random noise level
RETURN_LEN = 1000 # The objective function return vector length
def func(x):
if np.isnan(np.sum(x)):
raise ValueError('Invalid x: %s' % x)
v = np.random.rand(RETURN_LEN) * NOISE_LEVEL
v[:len(x)] += xF - x
return v
iteration = 0
while (1):
iteration += 1
x = np.zeros(len(xF))
y, cov = scipy.optimize.leastsq(func, x)
print('%04d %s' % (iteration, y))
The Jacobian is being computed numerically. In the production code, the optimization normally works except for when the starting estimate is too good, the surface is exceedingly flat, and noise overwhelms the deltas used to numerically compute the Jacobian. In this case, the residuals of the objective function appear as random noise like the above code example, and I would not expect the optimization to converge.
In this code example, small NOISE_LEVEL values (<1e-10) always converge. At 1e-6, the ValueError is thrown typically within a few hundred attempts.
One possible workaround is to return a highly penalized residual (either NaN or INF), such as:
v = np.empty(RETURN_LEN)
v.fill(np.nan)
return v
This workaround seems to be effective if dirty. Any better alternatives or ways to prevent NaNs in the first place?
This behavior was observed under Python 2.7.9 x32 running on Windows 7.