Using scipy.optimize.curve_fit with weights

Question

According to the documentation, the argument sigma can be used to set the weights of the data points in the fit. These "describe" 1-sigma errors when the argument absolute_sigma=True.

I have some data with artificial normally-distributed noise which varies:

n = 200
x = np.linspace(1, 20, n)
x0, A, alpha = 12, 3, 3

def f(x, x0, A, alpha):
    return A * np.exp(-((x-x0)/alpha)**2)

noise_sigma = x/20
noise = np.random.randn(n) * noise_sigma
yexact = f(x, x0, A, alpha)
y = yexact + noise

If I want to fit the noisy y to f using curve_fit to what should I set sigma? The documentation isn't very specific here, but I would usually use 1/noise_sigma**2 as the weight:

p0 = 10, 4, 2
popt, pcov = curve_fit(f, x, y, p0)
popt2, pcov2 = curve_fit(f, x, y, p0, sigma=1/noise_sigma**2, absolute_sigma=True)

It doesn't seem to improve the fit much, though.

enter image description here

Is this option only used to better interpret the fit uncertainties through the covariance matrix? What is the difference between these two telling me?

In [249]: pcov
Out[249]: 
array([[  1.10205238e-02,  -3.91494024e-08,   8.81822412e-08],
       [ -3.91494024e-08,   1.52660426e-02,  -1.05907265e-02],
       [  8.81822412e-08,  -1.05907265e-02,   2.20414887e-02]])

In [250]: pcov2
Out[250]: 
array([[ 0.26584674, -0.01836064, -0.17867193],
       [-0.01836064,  0.27833   , -0.1459469 ],
       [-0.17867193, -0.1459469 ,  0.38659059]])

When you say it doesn't seem to improve the fit much, what were you expecting to see? — Croad Langshan, Dec 29 '14 at 21:51
Herds of wildebeest sweeping majestically across the plain. Or failing that, I thought that the rms fit residual would be better in the "with-sigma" case, but it's worse (0.64 vs 1.07). — xnx, Dec 30 '14 at 00:25
LOL, wildebeeste. Doesn't the unweighted algorithm minimize the rms though (looking back to dimly remembered days when I did a lot of curve fitting)? In which case, surely weighting would only be expected to increase it? You're telling it "don't worry too much about these points over here, fit these other points better even at the cost of overall rms". — Croad Langshan, Dec 30 '14 at 01:54
Just a note: R's nls takes weights and it looks like that Python's `sigma` corresponds to the square root of nls' weights. — user3637203, Feb 15 '18 at 12:04

score 11 · Answer 1 · answered Dec 15 '18 at 01:44

At least with scipy version 1.1.0 the parameter sigma should be equal to the error on each parameter. Specifically the documentation says:

A 1-d sigma should contain values of standard deviations of errors in ydata. In this case, the optimized function is chisq = sum((r / sigma) ** 2).

In your case that would be:

curve_fit(f, x, y, p0, sigma=noise_sigma, absolute_sigma=True)

I looked through the source code and verified that when you specify sigma this way it minimizes ((f-data)/sigma)**2.

As a side note, this is in general what you want to be minimizing when you know the errors. The likelihood of observing points data given a model f is given by:

L(data|x0,A,alpha) = product over i Gaus(data_i, mean=f(x_i,x0,A,alpha), sigma=sigma_i)

which if you take the negative log becomes (up to constant factors that don't depend on the parameters):

-log(L) = sum over i (f(x_i,x0,A,alpha)-data_i)**2/(sigma_i**2)

which is just the chisquare.

I wrote a test program to verify that curve_fit was indeed returning the correct values with the sigma specified correctly:

from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit, fmin

np.random.seed(0)

def make_chi2(x, data, sigma):
    def chi2(args):
        x0, A, alpha = args
        return np.sum(((f(x,x0,A,alpha)-data)/sigma)**2)
    return chi2

n = 200
x = np.linspace(1, 20, n)
x0, A, alpha = 12, 3, 3

def f(x, x0, A, alpha):
    return A * np.exp(-((x-x0)/alpha)**2)

noise_sigma = x/20
noise = np.random.randn(n) * noise_sigma
yexact = f(x, x0, A, alpha)
y = yexact + noise

p0 = 10, 4, 2

# curve_fit without parameters (sigma is implicitly equal to one)
popt, pcov = curve_fit(f, x, y, p0)
# curve_fit with wrong sigma specified
popt2, pcov2 = curve_fit(f, x, y, p0, sigma=1/noise_sigma**2, absolute_sigma=True)
# curve_fit with correct sigma
popt3, pcov3 = curve_fit(f, x, y, p0, sigma=noise_sigma, absolute_sigma=True)

chi2 = make_chi2(x,y,noise_sigma)

# double checking that we get the correct answer
xopt = fmin(chi2,p0,xtol=1e-10,ftol=1e-10)

print("popt  = %s, chi2 = %.2f" % (popt,chi2(popt)))
print("popt2 = %s, chi2 = %.2f" % (popt2, chi2(popt2)))
print("popt3 = %s, chi2 = %.2f" % (popt3, chi2(popt3)))
print("xopt  = %s, chi2 = %.2f" % (xopt, chi2(xopt)))

which outputs:

popt  = [ 11.93617403   3.30528488   2.86314641], chi2 = 200.66
popt2 = [ 11.94169083   3.30372955   2.86207253], chi2 = 200.64
popt3 = [ 11.93128545   3.333727     2.81403324], chi2 = 200.44
xopt  = [ 11.93128603   3.33373094   2.81402741], chi2 = 200.44

As you can see the chi2 is indeed minimized correctly when you specify sigma=sigma as an argument to curve_fit.

As to why the improvement isn't "better", I'm not really sure. My only guess is that without specifying a sigma value you implicitly assume they are equal and over the part of the data where the fit matters (the peak), the errors are "approximately" equal.

To answer your second question, no the sigma option is not only used to change the output of the covariance matrix, it actually changes what is being minimized.

@KornpobBhirombhakdi if you know the noise term then you can just subtract it from the data and then you have a *perfect* signal, and you don't even really need to fit anything. For real data, you often know the standard deviation of the error but you don't know the *actual* error for each data point and that is why you fit. — user545424, May 23 '19 at 23:39

Using scipy.optimize.curve_fit with weights

1 Answers1