0

I want to find the covariance of a 10304*280 matrix (i.e 280 variable and each have 10304 subjects) and I am using the following numpy function to find this.

cov = numpy.cov(matrix)

I am expected 208*280 matrix as a result but it returned 10304*10304 matrix.

  • try swapping axes first? `np.swapaxis()` – kevinkayaks Sep 20 '18 at 18:20
  • read about it in the numpy docs. You either want to specify the axes in `np.cov` (not sure if you can do this, but check) or you want to swap the axes before you calculate covariance. Probably your problem is that the covariance is being calculated on the wrong axis of `matrix` – kevinkayaks Sep 20 '18 at 18:24
  • I think you spotted right but how to specify axes in np.cov? – Syed Fahad Bukhari Sep 20 '18 at 18:37

2 Answers2

0

here is what numpy.cov(m, y=None..) document says

m : array_like A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables...

So if your matrix contains 280 variable with 10304 samples for each, it suppose to be 280*10304 matrix instead of 10304*280 one. The simple solution would be same as others suggesting.

swap_matrix = numpy.swapaxis(matrix)
cov = numpy.cov(swap_matrix)
Rui Zheng
  • 189
  • 6
0

As suggested in the previous answer, you can change your memory layout. An easy way to do this in 2d is simply transposing the matrix:

import numpy as np
r = np.random.rand(100, 10)
np.cov(r).shape # is (100,100)
np.cov(r.T).shape # is (10,10)

But you can also specify a rowvar flag. Read about it here:

import numpy as np
r = np.random.rand(100, 10)
np.cov(r).shape # is (100,100)
np.cov(r, rowvar=False).shape # is (10,10)

I think especially for large matrices this might be more performant, since you avoid the swapping/transposing of axes.

UPDATE:

I thought about this and wondered if the algorithm is actually different depending on rowvar == True or rowvar == False. Well, as it turns out, if you change the rowvar flag, numpy simply transposes the array itself :P.

Look here.

So, in terms of performance, nothing will change between the two versions.

lhk
  • 27,458
  • 30
  • 122
  • 201