What is the difference between MATLAB/Octave corr and Python numpy.correlate?

Question

I am trying to port a MATLAB/Octave program to Python using NumPy 1.8.0 and Python 2.7.3. I've used this reference as help in converting MATLAB functions to NumPy methods with great success, until I get to the point where I want to compute the correlation between two matrices.

The first matrix is 40000x25 floats, the second matrix is 40000x1 ints. In Octave I use the statement corr(a,b) and get a 25x1 matrix of floats. Trying the corresponding method in NumPy (numpy.correlate(a,b)) produces an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_1a9aa5a_20130415-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 751, in correlate
  return multiarray.correlate2(a,v,mode)
ValueError: object too deep for desired array

I can get it to work if I change the code to calculate a correlation for each column of a, like so:

for i in range(25):
    c2[i] = numpy.correlate(a[:,i], b)

However, the values in the c2 array are different than the output from Octave. Octave returns a 25x1 matrix of floats all less than 1. The values I get from NumPy are floats between -270 and 900.

I have tried to understand what the two algorithms are doing under the hood but have failed miserably. Can someone point out my logic failure?

score 8 · Accepted Answer · edited May 23 '17 at 11:43

8

It appears that there exists a numpy.corrcoef which computes the correlation coefficients, as desired. However, its interface is different from the Octave/Matlab corr.

First of all, by default, the function treats rows as variables, with the columns being observations. To mimic the behavior of Octave/Matlab, you can pass a flag which reverses this.

Also, according to this answer, the numpy.cov function (which corrcoef uses internally, I assume) returns a 2x2 matrix, each of which contain a specific covariance:

cov(a,a)  cov(a,b)

cov(a,b)  cov(b,b)

As he points out, the [0][1] element is what you'd want for cov(a,b). Thus, perhaps something like this will work:

for i in range(25):
    c2[i] = numpy.corrcoef(a[:,i], b, rowvar=0)[0][1]

For reference, here are some excerpts of the two functions that you had tried. It seems to be that they perform completely different things.

Octave:

— Function File: corr (x, y)

Compute matrix of correlation coefficients.

If each row of x and y is an observation and each column is a variable, then the (i, j)-th entry of corr (x, y) is the correlation between the i-th variable in x and the j-th variable in y.
      corr (x,y) = cov (x,y) / (std (x) * std (y))
If called with one argument, compute corr (x, x), the correlation between the columns of x.

And Numpy:

numpy.correlate(a, v, mode='valid', old_behavior=False)[source]

Cross-correlation of two 1-dimensional sequences.

This function computes the correlation as generally defined in signal processing texts:
z[k] = sum_n a[n] * conj(v[n+k])
with a and v sequences being zero-padded where necessary and conj being the conjugate.

edited May 23 '17 at 11:43

Community

1
1

answered May 22 '13 at 18:25

voithos

68,482
12
101
116

I have tried that but `numpy.cov(a,b)` returns a 2x2 array of floats and I don't know how that relates to the correlation. – Crystal May 22 '13 at 18:40
or try to simply get [Pearson's r](http://docs.scipy.org/doc/scipy-0.12.0/reference/generated/scipy.stats.pearsonr.html) – Rasman May 22 '13 at 18:52
or [this](http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html) – Rasman May 22 '13 at 18:53
@Rasman - I have tried `corrcoef`, but it also returns a 2x2 array that I don't know what to do with. I just tried `pearsonsr` and have the same questions. While these values look much better than what I'm getting from `correlate`, I'm still not getting the same answers as the octave function. Should I expect to? – Crystal May 22 '13 at 19:02
@Crystal: I think I may have found what the 2x2 array is all about. I've edited my answer to give a possible solution. – voithos May 22 '13 at 19:03
@voithos: thanks. I'm still not getting the same values as the octave output, but it's looking much better. – Crystal May 22 '13 at 19:15
@Crystal: Do you have any (small) example values that you could show? – voithos May 22 '13 at 19:21
@voithos: I just ran the full program. While the numbers don't match perfectly, the result looks ok - I'm getting a spike on my plot at the value I expect to. I'll try to study this to understand a bit more of what's really going on, but I'm marking this answered. Thanks! – Crystal May 22 '13 at 19:25
@Crystal: If you figure this out, I'd encourage you to write up an answer yourself so that others may benefit in the future. – voithos May 22 '13 at 19:30
1

@voithos: Just realized my mistake. I had a one-off error in my expected value (forgot that matlab/octave starts at 1). Once I made that correction, your solution gives me correlation values nearly identical to octave. And the plot looks exactly the same. THANKS! – Crystal May 22 '13 at 19:57

What is the difference between MATLAB/Octave corr and Python numpy.correlate?

1 Answers1