I used to run the DROITEREG
functions in a calc sheet. Here is an example :
At the top left, there are the data and at the bottom the results of the function DROITEREG
which are a 2 by 5 table. I wrote the labels of several cells. a and b are the parameters of the linear regression and u(a) and u(b) the uncertaintes on a and b. I would like to compute these uncertaintes from a numpy functions.
I succeed from curve_fit
function :
import numpy as np
from scipy.stats import linregress
from scipy.optimize import curve_fit
data_o = """
0.42 2.0
0.97 5.0
1.71 10.0
2.49 20.0
3.53 50.0
3.72 100.0
"""
vo, So = np.loadtxt(data_o.split("\n"), unpack=True)
def f_model(x, a, b):
return a * x + b
popt, pcov = curve_fit(
f=f_model, # model function
xdata=1 / So, # x data
ydata=1 / vo, # y data
p0=(1, 1), # initial value of the parameters
)
# parameters
print(popt)
# uncertaintes :
print(np.sqrt(np.diag(pcov)))
The output is the following, the results are consistent with those obtained with DROITEREG
:
[ 4.35522612 0.18629772]
[ 0.07564571 0.01699926]
But this is not fully satisfactory because, this should be obtained easily from a simple least square function. So I tried to use polyfit
.
(a, b), Mcov = np.polyfit(1 / So, 1 / vo, 1, cov=True)
print("a = ", a, " b = ", b)
print("SSR = ", sum([(y - (a * x + b))**2 for x, y in zip(1 / So, 1 / vo)]))
print("Cov mat\n", Mcov)
print("Cov mat diag ", np.diag(Mcov))
print("sqrt 1/2 cov mat diag ", np.sqrt(0.5 * np.diag(Mcov)))
The output is :
a = 4.35522612104 b = 0.186297716685
SSR = 0.00398117627681
Cov mat
[[ 0.01144455 -0.00167853]
[-0.00167853 0.00057795]]
Cov mat diag [ 0.01144455 0.00057795]
sqrt 1/2 cov mat diag [ 0.07564571 0.01699926]
At the end, I noticed that the Mcov matrix from polyfit
is 2 times the pcov matrix from curve_fit
. If I try to do a fit with a larger degrees of the polynomial I saw that the factor is equal to the number of parameters.
I do not succeed using linregress
from scipy.stats
because I do not know how to obtain this covariance matrix of the parameters estimates. Again I also succeed using scipy.odr
but it is again more complicated than the above solutions and this for a trivial linear regression. Maybe I missed something because I am not kind in statistic and I do not really understand the meaning of this covariance matrix.
Thus, what I would to know is the easiest way to obtain the parameters of the linear regression and the related uncertainties (correlation coefficient would be also a good point but it is more easy to compute it). My main objective is for example to give to a student in chemistry or physics the easy way to do this linear regression and compute the parameters associated to this model.