numpy covariance matrix

Question

Suppose I have two vectors of length 25, and I want to compute their covariance matrix. I try doing this with numpy.cov, but always end up with a 2x2 matrix.

>>> import numpy as np
>>> x=np.random.normal(size=25)
>>> y=np.random.normal(size=25)
>>> np.cov(x,y)
array([[ 0.77568388,  0.15568432],
       [ 0.15568432,  0.73839014]])

Using the rowvar flag doesn't help either - I get exactly the same result.

>>> np.cov(x,y,rowvar=0)
array([[ 0.77568388,  0.15568432],
       [ 0.15568432,  0.73839014]])

How can I get the 25x25 covariance matrix?

David Marx · Accepted Answer · 2013-02-23T02:22:18.230

13

You have two vectors, not 25. The computer I'm on doesn't have python so I can't test this, but try:

z = zip(x,y)
np.cov(z)

Of course.... really what you want is probably more like:

n=100 # number of points in each vector
num_vects=25
vals=[]
for _ in range(num_vects):
    vals.append(np.random.normal(size=n))
np.cov(vals)

This takes the covariance (I think/hope) of num_vects 1xn vectors

edited Feb 23 '13 at 02:22

answered Feb 23 '13 at 02:15

David Marx

8,172
3
45
66

No, I only have 2 vectors, each with 25 points. The solution with zip does in fact produce a 25x25 matrix, I still have to figure out if it's what I was hoping to get. Thanks anyway :) – user13321 Feb 23 '13 at 02:28
4

If you have two vectors with 25 points, you probably just want a 2x2 covariance matrix. – David Marx Feb 23 '13 at 03:02

score 13 · Answer 2 · edited May 11 '15 at 02:38

13

Try this:

import numpy as np
x=np.random.normal(size=25)
y=np.random.normal(size=25)
z = np.vstack((x, y))
c = np.cov(z.T)

edited May 11 '15 at 02:38

bernie2436

22,841
49
151
244

answered Feb 25 '13 at 13:55

Sylou

139
2

vstack() takes exactly 1 argument (2 given) – user1244215 Aug 08 '13 at 22:17
2

should be z = np.vstack((x, y)) – mrgloom Jan 17 '14 at 06:24

score 5 · Answer 3 · edited Jun 20 '20 at 09:12

Covariance matrix from samples vectors

To clarify the small confusion regarding what is a covariance matrix defined using two N-dimensional vectors, there are two possibilities.

The question you have to ask yourself is whether you consider:

each vector as N realizations/samples of one single variable (for example two 3-dimensional vectors [X1,X2,X3] and [Y1,Y2,Y3], where you have 3 realizations for the variables X and Y respectively)
each vector as 1 realization for N variables (for example two 3-dimensional vectors [X1,Y1,Z1] and [X2,Y2,Z2], where you have 1 realization for the variables X,Y and Z per vector)

Since a covariance matrix is intuitively defined as a variance based on two different variables:

in the first case, you have 2 variables, N example values for each, so you end up with a 2x2 matrix where the covariances are computed thanks to N samples per variable
in the second case, you have N variables, 2 samples for each, so you end up with a NxN matrix

About the actual question, using numpy

if you consider that you have 25 variables per vector (took 3 instead of 25 to simplify example code), so one realization for several variables in one vector, use rowvar=0

# [X1,Y1,Z1]
X_realization1 = [1,2,3]

# [X2,Y2,Z2]
X_realization2 = [2,1,8]

numpy.cov([X,Y],rowvar=0) # rowvar false, each column is a variable

Code returns, considering 3 variables:

array([[ 0.5, -0.5,  2.5],
       [-0.5,  0.5, -2.5],
       [ 2.5, -2.5, 12.5]])

otherwise, if you consider that one vector is 25 samples for one variable, use rowvar=1 (numpy's default parameter)

# [X1,X2,X3]
X = [1,2,3]

# [Y1,Y2,Y3]
Y = [2,1,8]

numpy.cov([X,Y],rowvar=1) # rowvar true (default), each row is a variable

Code returns, considering 2 variables:

array([[ 1.        ,  3.        ],
       [ 3.        , 14.33333333]])

score 3 · Answer 4 · answered Feb 23 '13 at 02:16

3

Reading the documentation as,

>> np.cov.__doc__

or looking at Numpy Covariance, Numpy treats each row of array as a separate variable, so you have two variables and hence you get a 2 x 2 covariance matrix.

I think the previous post has right solution. I have the explanation :-)

answered Feb 23 '13 at 02:16

Arcturus

550
2
6

this produces a (50, 50) matrix! – user13321 Feb 23 '13 at 02:21

score 2 · Answer 5 · answered Aug 05 '15 at 18:34

I suppose what youre looking for is actually a covariance function which is a timelag function. I'm doing autocovariance like that:

 def autocovariance(Xi, N, k):
    Xs=np.average(Xi)
    aCov = 0.0
    for i in np.arange(0, N-k):
        aCov = (Xi[(i+k)]-Xs)*(Xi[i]-Xs)+aCov
    return  (1./(N))*aCov

autocov[i]=(autocovariance(My_wector, N, h))

score 2 · Answer 6 · answered Nov 20 '17 at 13:43

2

You should change

np.cov(x,y, rowvar=0)

onto

np.cov((x,y), rowvar=0)

answered Nov 20 '17 at 13:43

FooBar167

2,721
1
26
37

score 2 · Answer 7 · answered Feb 01 '19 at 19:08

What you got (2 by 2) is more useful than 25*25. Covariance of X and Y is an off-diagonal entry in the symmetric cov_matrix.

If you insist on (25 by 25) which I think useless, then why don't you write out the definition?

x=np.random.normal(size=25).reshape(25,1) # to make it 2d array.
y=np.random.normal(size=25).reshape(25,1)

cov =  np.matmul(x-np.mean(x), (y-np.mean(y)).T) / len(x)

score 0 · Answer 8 · answered Feb 23 '13 at 02:30

0

As pointed out above, you only have two vectors so you'll only get a 2x2 cov matrix.

IIRC the 2 main diagonal terms will be sum( (x-mean(x))**2) / (n-1) and similarly for y.

The 2 off-diagonal terms will be sum( (x-mean(x))(y-mean(y)) ) / (n-1). n=25 in this case.

answered Feb 23 '13 at 02:30

Stuart

875
6
12

a covariance matrix is, according to wikipedia, is "a matrix whose element in the i, j position is the covariance between the i th and j th elements" If his i and j are 25, how can that lead to a 2x2 matrix?? – john k Nov 19 '17 at 01:22

andrewchan2022 · Answer 9 · 2019-04-29T17:41:05.260

according the document, you should expect variable vector in column:

If we examine N-dimensional samples, X = [x1, x2, ..., xn]^T

though later it says each row is a variable

Each row of m represents a variable.

so you need input your matrix as transpose

x=np.random.normal(size=25)
y=np.random.normal(size=25)
X = np.array([x,y])
np.cov(X.T)

and according to wikipedia: https://en.wikipedia.org/wiki/Covariance_matrix

X is column vector variable
X = [X1,X2, ..., Xn]^T
COV = E[X * X^T] - μx * μx^T   // μx = E[X]

you can implement it yourself:

# X each row is variable
X = X - X.mean(axis=0)
h,w = X.shape
COV = X.T @ X / (h-1)

score -2 · Answer 10 · answered Jul 31 '16 at 03:34

-2

i don't think you understand the definition of covariance matrix. If you need 25 x 25 covariance matrix, you need 25 vectors each with n data points.

answered Jul 31 '16 at 03:34

Edison Chen

7
1

2

no.. you just need 2 vectors length 25. Look at the wikipedia definition. – john k Nov 19 '17 at 01:24

numpy covariance matrix

10 Answers10

Covariance matrix from samples vectors

About the actual question, using numpy

Linked