21

I'm trying to fit a histogram with some data in it using scipy.optimize.curve_fit. If I want to add an error in y, I can simply do so by applying a weight to the fit. But how to apply the error in x (i. e. the error due to binning in case of histograms)?

My question also applies to errors in x when making a linear regression with curve_fit or polyfit; I know how to add errors in y, but not in x.

Here an example (partly from the matplotlib documentation):

import numpy as np
import pylab as P
from scipy.optimize import curve_fit

# create the data histogram
mu, sigma = 200, 25
x = mu + sigma*P.randn(10000)

# define fit function
def gauss(x, *p):
    A, mu, sigma = p
    return A*np.exp(-(x-mu)**2/(2*sigma**2))

# the histogram of the data
n, bins, patches = P.hist(x, 50, histtype='step')
sigma_n = np.sqrt(n)  # Adding Poisson errors in y
bin_centres = (bins[:-1] + bins[1:])/2
sigma_x = (bins[1] - bins[0])/np.sqrt(12)  # Binning error in x
P.setp(patches, 'facecolor', 'g', 'alpha', 0.75)

# fitting and plotting
p0 = [700, 200, 25]
popt, pcov = curve_fit(gauss, bin_centres, n, p0=p0, sigma=sigma_n, absolute_sigma=True)
x = np.arange(100, 300, 0.5)
fit = gauss(x, *popt)
P.plot(x, fit, 'r--')

Now, this fit (when it doesn't fail) does consider the y-errors sigma_n, but I haven't found a way to make it consider sigma_x. I scanned a couple of threads on the scipy mailing list and found out how to use the absolute_sigma value and a post on Stackoverflow about asymmetrical errors, but nothing about errors in both directions. Is it possible to achieve?

Community
  • 1
  • 1
Zollern
  • 305
  • 1
  • 2
  • 11
  • I do not know whether curve_fit can handle errors in x but scipy.optimize.odr does. Actually it does orthogonal distance regression rather than simple least squares on the dependent variable. – Christian K. Sep 26 '14 at 12:35
  • Thank you for your comment! I didn't find another fit function (odr is in scipy.odr, by the way, not in scipy.optimize.odr). It works perfectly, thanks! If you post your comment as an answer, I'm happy to accept it as a solution. :-) – Zollern Sep 27 '14 at 11:22
  • @ChristianK. you could post your comment as an answer... – Saullo G. P. Castro Sep 28 '14 at 09:03

1 Answers1

27

scipy.optmize.curve_fit uses standard non-linear least squares optimization and therefore only minimizes the deviation in the response variables. If you want to have an error in the independent variable to be considered you can try scipy.odr which uses orthogonal distance regression. As its name suggests it minimizes in both independent and dependent variables.

Have a look at the sample below. The fit_type parameter determines whether scipy.odr does full ODR (fit_type=0) or least squares optimization (fit_type=2).

EDIT

Although the example worked it did not make much sense, since the y data was calculated on the noisy x data, which just resulted in an unequally spaced indepenent variable. I updated the sample which now also shows how to use RealData which allows for specifying the standard error of the data instead of the weights.

from scipy.odr import ODR, Model, Data, RealData
import numpy as np
from pylab import *

def func(beta, x):
    y = beta[0]+beta[1]*x+beta[2]*x**3
    return y

#generate data
x = np.linspace(-3,2,100)
y = func([-2.3,7.0,-4.0], x)

# add some noise
x += np.random.normal(scale=0.3, size=100)
y += np.random.normal(scale=0.1, size=100)

data = RealData(x, y, 0.3, 0.1)
model = Model(func)

odr = ODR(data, model, [1,0,0])
odr.set_job(fit_type=2)
output = odr.run()

xn = np.linspace(-3,2,50)
yn = func(output.beta, xn)
hold(True)
plot(x,y,'ro')
plot(xn,yn,'k-',label='leastsq')
odr.set_job(fit_type=0)
output = odr.run()
yn = func(output.beta, xn)
plot(xn,yn,'g-',label='odr')
legend(loc=0)

fit to noisy data

Christian K.
  • 2,785
  • 18
  • 40
  • 1
    Nice answer! Do you know the difference between `output.sd_beta` and `np.sqrt(np.diag(output.cov_beta))` ? Which one correspond to the uncertainties on the parameters ? – Ger Dec 07 '16 at 23:03
  • Thanks. The scipy docs refer to the original paper. All information should be in there. I use sd_beta as uncertainties on the paramters. – Christian K. Dec 09 '16 at 01:09
  • Actually there is probably a bug in scipy or ODR due to sb_beta and cov_beta. I asked a question about that http://stackoverflow.com/questions/41028846/how-to-compute-standard-error-from-odr-results – Ger Dec 09 '16 at 07:49
  • @ChristianK. as far as I can tell the bug report is open https://github.com/scipy/scipy/issues/6842 What makes you say it was not a bug? – Gabriel Jun 21 '17 at 19:04
  • The very same user who wrote that post calls it a bug and he opened the [bug report](https://github.com/scipy/scipy/issues/6842) (which is still open and acknowledged by scipy devs). If you do not believe this to be a bug, you should go over there and explain your reasons. – Gabriel Jun 21 '17 at 22:38
  • @ChristianK. but the docs aren't right. If `cov_beta` indeed returned the `Covariance matrix of the estimated parameters`, then the sqrt of its diagonal would results in the standard devs of the parameters, equal to `sd_beta`. It does not, because what it actually returns is a *scaled* covariance matrix. The docs do not mention this, so it is a bug. Either in the implementation of the package, or in the docs. – Gabriel Jun 22 '17 at 17:06
  • @Christian, Thanks for the answer. Is there anyway to retrieve or extract best numerical values of fit parameters beta[0] and beta[1] including their 1-sigma errorbars for a known best fit model of the form y=beta[0]*x+beta[1]? simply output.beta returns the same values initialized in ODR function! – Rebel Nov 15 '17 at 08:00
  • @Allan, if that happens, the fit did not converge, i.e. was not succesful. Try with improving the initial guess. What is the value of `output.stopreason`? – Christian K. Nov 15 '17 at 12:39
  • value is ['Numerical error detected']. My function is not exactly linear but a piecewise function such that below xo, it gives m*x-m*xo and above xo it gives 0. – Rebel Nov 15 '17 at 18:21
  • Post your problem as question with code and data and I'll have a look at it. – Christian K. Nov 16 '17 at 19:28