2

I used to run the DROITEREG functions in a calc sheet. Here is an example :

enter image description here

At the top left, there are the data and at the bottom the results of the function DROITEREG which are a 2 by 5 table. I wrote the labels of several cells. a and b are the parameters of the linear regression and u(a) and u(b) the uncertaintes on a and b. I would like to compute these uncertaintes from a numpy functions.

I succeed from curve_fit function :

import numpy as np
from scipy.stats import linregress
from scipy.optimize import curve_fit

data_o = """
0.42    2.0
0.97    5.0
1.71   10.0
2.49   20.0
3.53   50.0
3.72  100.0
"""
vo, So = np.loadtxt(data_o.split("\n"), unpack=True)

def f_model(x, a, b):
    return a * x + b

popt, pcov = curve_fit(
    f=f_model,     # model function
    xdata=1 / So,  # x data
    ydata=1 / vo,  # y data
    p0=(1, 1),     # initial value of the parameters
)

# parameters
print(popt)
# uncertaintes :
print(np.sqrt(np.diag(pcov)))

The output is the following, the results are consistent with those obtained with DROITEREG :

[ 4.35522612  0.18629772]
[ 0.07564571  0.01699926]

But this is not fully satisfactory because, this should be obtained easily from a simple least square function. So I tried to use polyfit.

(a, b), Mcov = np.polyfit(1 / So, 1 / vo, 1, cov=True)
print("a = ", a, " b = ", b)
print("SSR = ", sum([(y - (a * x + b))**2 for x, y in zip(1 / So, 1 / vo)]))
print("Cov mat\n", Mcov)
print("Cov mat diag          ", np.diag(Mcov))
print("sqrt 1/2 cov mat diag ", np.sqrt(0.5 * np.diag(Mcov)))

The output is :

a =  4.35522612104  b =  0.186297716685
SSR =  0.00398117627681
Cov mat
 [[ 0.01144455 -0.00167853]
 [-0.00167853  0.00057795]]
Cov mat diag           [ 0.01144455  0.00057795]
sqrt 1/2 cov mat diag  [ 0.07564571  0.01699926]

At the end, I noticed that the Mcov matrix from polyfit is 2 times the pcov matrix from curve_fit. If I try to do a fit with a larger degrees of the polynomial I saw that the factor is equal to the number of parameters.

I do not succeed using linregress from scipy.stats because I do not know how to obtain this covariance matrix of the parameters estimates. Again I also succeed using scipy.odr but it is again more complicated than the above solutions and this for a trivial linear regression. Maybe I missed something because I am not kind in statistic and I do not really understand the meaning of this covariance matrix.

Thus, what I would to know is the easiest way to obtain the parameters of the linear regression and the related uncertainties (correlation coefficient would be also a good point but it is more easy to compute it). My main objective is for example to give to a student in chemistry or physics the easy way to do this linear regression and compute the parameters associated to this model.

Ger
  • 9,076
  • 10
  • 37
  • 48

0 Answers0