0

I am able calculate pearson correlation between two list:

import scipy
from scipy import stats
from scipy.stats import pearsonr 
List1 = [1,2,3,4,5]
List2 = [2,3,4,5,6]
pearson = scipy.stats.pearsonr(List1,List2)
print "pearson correlation: " + str(pearson)

I would like a list of the observed - expected values for list1. Would someone know how to extend this code to print the observed-expected values please?

Given the instructions for this test: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html I'm not sure that I can obtain these values using this method, or if something else is more suited?

Edit, I can now calculate the linear regression model:

import sys
import scipy
from scipy import stats

List1 = [1,2,3,4,5]
List2 = [1,3,5,6,7]

slope, intercept, r_value, p_value, std_err = stats.linregress(List1,List2)

Added note:

I assumed that once I fitted a model as kindly suggested, that I would figure out how to fit the residuals using the linregress package.

However, when I call:

>>> dir(stats.linregress)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']

I can see that there's nothing there that would resemble .resid/.residuals etc.Could someone point me in the right direction for the next step? The aim is to calculate a list of the observed - expected values for list1 (see above?)

Correction:

import scipy
from scipy import stats
x = [28,26,44,40,10,7,27,25,26,10]
y = [0.055,0.074,0.049,0.067,0.037,0.036,0.044,0.041,0.071,0.03]
print scipy.stats.linregress(x,y)

Gives me this:

(0.00075454346398073121, 0.032064593825268217, 0.59378410770471368, 0.07031502216706445, 0.00036149633561360087)

which i assume is the residuals. Many thanks.

*******CORRECTION******

0.00075454346398073121 is the slope (m), and 0.032064593825268217 is constant (c).

To obtain the raw residuals for the first two items in lists x and y (see above) respectively, use y=mx+c:

list1 = [28,26]
X1 observed = 28; X1 predicted = 0.000754*28 + 0.03206 = 0.0531
X1 residual = 28 - 0.0531 = 27.94

X2 observed = 26; X2 predicted = 0.000764*26+0.03206 = 0.051664
X2 residual = 26 - 0.051664 = 25.94



list2 [0.055,0.074]
Y1 observed = 0.055; Y1 predicted = 0.000764*0.055+0.03206 = 0.032
Y1 residual = 0.055 - 0.032 = 0.022

Y2 observed = 0.074; Y2 predicted = 0.000764*0.074+0.03206 = 0.032
Y2 residual = 0.074 - 0.032 = 0.041.

Many thanks.

Mea
  • 1
  • 2
  • That is not the residual, it's a tuple with (slope, intercept, r-value, p-value, stderr). To get the residuals, you need to apply the model y=mx+b where m is the slope and b is the intercept, and subtract it from the "raw" data points. – Benjamin Dec 02 '15 at 18:10
  • many thanks. I've added the correction and an example to make sure I understand correctly. – Mea Dec 03 '15 at 12:01
  • Still not quite there. You should consult a good stats text or website for a basic explanation of linear regression. The tools won't help much unless you understand what the outputs mean and how to use them. – Benjamin Dec 04 '15 at 00:25

1 Answers1

0

It returns a tuple, which you can unpack:

corcoef, pval = scipy.stats.pearsonr(List1,List2)

What you are after is a linear regression: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html#scipy.stats.linregress

You can then use the model to calculate the residuals.

Benjamin
  • 11,560
  • 13
  • 70
  • 119
  • I think I fixed it many thanks, the answer in case other people are interested, is in the edit above. – Mea Dec 02 '15 at 14:00