7

I am trying to port a MATLAB/Octave program to Python using NumPy 1.8.0 and Python 2.7.3. I've used this reference as help in converting MATLAB functions to NumPy methods with great success, until I get to the point where I want to compute the correlation between two matrices.

The first matrix is 40000x25 floats, the second matrix is 40000x1 ints. In Octave I use the statement corr(a,b) and get a 25x1 matrix of floats. Trying the corresponding method in NumPy (numpy.correlate(a,b)) produces an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_1a9aa5a_20130415-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 751, in correlate
  return multiarray.correlate2(a,v,mode)
ValueError: object too deep for desired array

I can get it to work if I change the code to calculate a correlation for each column of a, like so:

for i in range(25):
    c2[i] = numpy.correlate(a[:,i], b)

However, the values in the c2 array are different than the output from Octave. Octave returns a 25x1 matrix of floats all less than 1. The values I get from NumPy are floats between -270 and 900.

I have tried to understand what the two algorithms are doing under the hood but have failed miserably. Can someone point out my logic failure?

sodd
  • 12,482
  • 3
  • 54
  • 62
Crystal
  • 137
  • 1
  • 2
  • 7

1 Answers1

8

It appears that there exists a numpy.corrcoef which computes the correlation coefficients, as desired. However, its interface is different from the Octave/Matlab corr.

First of all, by default, the function treats rows as variables, with the columns being observations. To mimic the behavior of Octave/Matlab, you can pass a flag which reverses this.

Also, according to this answer, the numpy.cov function (which corrcoef uses internally, I assume) returns a 2x2 matrix, each of which contain a specific covariance:

cov(a,a)  cov(a,b)

cov(a,b)  cov(b,b)

As he points out, the [0][1] element is what you'd want for cov(a,b). Thus, perhaps something like this will work:

for i in range(25):
    c2[i] = numpy.corrcoef(a[:,i], b, rowvar=0)[0][1]

For reference, here are some excerpts of the two functions that you had tried. It seems to be that they perform completely different things.

Octave:

— Function File: corr (x, y)

Compute matrix of correlation coefficients.

If each row of x and y is an observation and each column is a variable, then the (i, j)-th entry of corr (x, y) is the correlation between the i-th variable in x and the j-th variable in y.

      corr (x,y) = cov (x,y) / (std (x) * std (y))

If called with one argument, compute corr (x, x), the correlation between the columns of x.

And Numpy:

numpy.correlate(a, v, mode='valid', old_behavior=False)[source]

Cross-correlation of two 1-dimensional sequences.

This function computes the correlation as generally defined in signal processing texts:

z[k] = sum_n a[n] * conj(v[n+k])

with a and v sequences being zero-padded where necessary and conj being the conjugate.

Community
  • 1
  • 1
voithos
  • 68,482
  • 12
  • 101
  • 116
  • I have tried that but `numpy.cov(a,b)` returns a 2x2 array of floats and I don't know how that relates to the correlation. – Crystal May 22 '13 at 18:40
  • or try to simply get [Pearson's r](http://docs.scipy.org/doc/scipy-0.12.0/reference/generated/scipy.stats.pearsonr.html) – Rasman May 22 '13 at 18:52
  • or [this](http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html) – Rasman May 22 '13 at 18:53
  • @Rasman - I have tried `corrcoef`, but it also returns a 2x2 array that I don't know what to do with. I just tried `pearsonsr` and have the same questions. While these values look much better than what I'm getting from `correlate`, I'm still not getting the same answers as the octave function. Should I expect to? – Crystal May 22 '13 at 19:02
  • @Crystal: I think I may have found what the 2x2 array is all about. I've edited my answer to give a possible solution. – voithos May 22 '13 at 19:03
  • @voithos: thanks. I'm still not getting the same values as the octave output, but it's looking much better. – Crystal May 22 '13 at 19:15
  • @Crystal: Do you have any (small) example values that you could show? – voithos May 22 '13 at 19:21
  • @voithos: I just ran the full program. While the numbers don't match perfectly, the result looks ok - I'm getting a spike on my plot at the value I expect to. I'll try to study this to understand a bit more of what's really going on, but I'm marking this answered. Thanks! – Crystal May 22 '13 at 19:25
  • @Crystal: If you figure this out, I'd encourage you to write up an answer yourself so that others may benefit in the future. – voithos May 22 '13 at 19:30
  • 1
    @voithos: Just realized my mistake. I had a one-off error in my expected value (forgot that matlab/octave starts at 1). Once I made that correction, your solution gives me correlation values nearly identical to octave. And the plot looks exactly the same. THANKS! – Crystal May 22 '13 at 19:57