What is the easiest way to get uncertainties on linear regression parameters?

Question

I used to run the DROITEREG functions in a calc sheet. Here is an example :

At the top left, there are the data and at the bottom the results of the function DROITEREG which are a 2 by 5 table. I wrote the labels of several cells. a and b are the parameters of the linear regression and u(a) and u(b) the uncertaintes on a and b. I would like to compute these uncertaintes from a numpy functions.

I succeed from curve_fit function :

import numpy as np
from scipy.stats import linregress
from scipy.optimize import curve_fit

data_o = """
0.42    2.0
0.97    5.0
1.71   10.0
2.49   20.0
3.53   50.0
3.72  100.0
"""
vo, So = np.loadtxt(data_o.split("\n"), unpack=True)

def f_model(x, a, b):
    return a * x + b

popt, pcov = curve_fit(
    f=f_model,     # model function
    xdata=1 / So,  # x data
    ydata=1 / vo,  # y data
    p0=(1, 1),     # initial value of the parameters
)

# parameters
print(popt)
# uncertaintes :
print(np.sqrt(np.diag(pcov)))

The output is the following, the results are consistent with those obtained with DROITEREG :

[ 4.35522612  0.18629772]
[ 0.07564571  0.01699926]

But this is not fully satisfactory because, this should be obtained easily from a simple least square function. So I tried to use polyfit.

(a, b), Mcov = np.polyfit(1 / So, 1 / vo, 1, cov=True)
print("a = ", a, " b = ", b)
print("SSR = ", sum([(y - (a * x + b))**2 for x, y in zip(1 / So, 1 / vo)]))
print("Cov mat\n", Mcov)
print("Cov mat diag          ", np.diag(Mcov))
print("sqrt 1/2 cov mat diag ", np.sqrt(0.5 * np.diag(Mcov)))

The output is :

a =  4.35522612104  b =  0.186297716685
SSR =  0.00398117627681
Cov mat
 [[ 0.01144455 -0.00167853]
 [-0.00167853  0.00057795]]
Cov mat diag           [ 0.01144455  0.00057795]
sqrt 1/2 cov mat diag  [ 0.07564571  0.01699926]

At the end, I noticed that the Mcov matrix from polyfit is 2 times the pcov matrix from curve_fit. If I try to do a fit with a larger degrees of the polynomial I saw that the factor is equal to the number of parameters.

I do not succeed using linregress from scipy.stats because I do not know how to obtain this covariance matrix of the parameters estimates. Again I also succeed using scipy.odr but it is again more complicated than the above solutions and this for a trivial linear regression. Maybe I missed something because I am not kind in statistic and I do not really understand the meaning of this covariance matrix.

Thus, what I would to know is the easiest way to obtain the parameters of the linear regression and the related uncertainties (correlation coefficient would be also a good point but it is more easy to compute it). My main objective is for example to give to a student in chemistry or physics the easy way to do this linear regression and compute the parameters associated to this model.

What is the easiest way to get uncertainties on linear regression parameters?

0 Answers0