Linear regression slope error in numpy

Question

I use numpy.polyfit to get a linear regression: coeffs = np.polyfit(x, y, 1).

What is the best way to calculate the error of the fit's slope using numpy?

When you use `polyfit`, it returns the residuals of the fit. You can do a Chi2 on this to get the goodness of fit which I assume is what you are looking for. — ebarr, Dec 03 '14 at 23:53
The residuals are the absolute error for each point in the fit. — ebarr, Dec 04 '14 at 00:03

score 7 · Accepted Answer · answered Dec 04 '14 at 11:44

As already mentioned by @ebarr in the comments, you can use np.polyfit to return the residuals by using the keyword argument full=True.

Example:

x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z, residuals, rank, singular_values, rcond = np.polyfit(x, y, 3, full=True)

residuals then is the sum of least squares.

Alternatively, you can use the keyword argument cov=True to get the covariance matrix.

Example:

x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z, cov = np.polyfit(x, y, 3, cov=True)

Then, the diagonal elements of cov are the variances of the coefficients in z, i.e. np.sqrt(np.diag(cov)) gives you the standard deviations of the coefficients. You can use the standard deviations to estimate the probability that the absolute error exceeds a certain value, e.g. by inserting the standard deviations in the uncertainty propagation calculation. If you use e.g. 3*standard deviations in the uncertainty propagation, you calculate the error which will not be exceeded in 99.7% of the cases.

One last hint: you have to choose whether you choose full=True or cov=True. cov=True only works when full=False (default) or vice versa.

When I try to implement this solution, I get negative values along the diagonal of the co-variate matrix. — Michael Molter, Oct 15 '16 at 20:52

Linear regression slope error in numpy

1 Answers1