3

I am getting confused trying to run PCA on a set of spatial grids that have been read into numpy arrays. As arrays they look like this, where mdata[0] represents the set of rows and columns in a single grid, and each value represents a grid cell:

mdata = np.array([ [[3, 4, 6, 4],
                    [4, 9, 2, 7],
                    [8, 7, 2, 2]],

                   [[1, 3, 7, 9],
                    [3, 1, 8, 8],
                    [2, 6, 7, 5]],

                   [[1, 8, 5, 2],
                    [6, 4, 7, 3],
                    [3, 1, 4, 6]],

                   [[5, 8, 2, 4],
                    [3, 7, 7, 7],
                    [1, 4, 8, 2]],

                   [[4, 5, 6, 4],
                    [1, 7, 5, 7],
                    [2, 1, 3, 3]] ])

I then flatten the arrays so that the covariance can be properly calculated across the set, so the input array then looks like this:

[[3 4 6 4 4 9 2 7 8 7 2 2]
 [1 3 7 9 3 1 8 8 2 6 7 5]
 [1 8 5 2 6 4 7 3 3 1 4 6]
 [5 8 2 4 3 7 7 7 1 4 8 2]
 [4 5 6 4 1 7 5 7 2 1 3 3]]

I need to generate a set of output arrays (ultimately converted back to grids) representing the principal components. The output arrays result from multiplying the eigenvectors against each array element across the set of input grids (i.e., the input array), and summing them together. Each output grid will represent a principal component, in decreasing order of explained variance.

So the final equations will look something like this:

First principal component (uses first eigenvector array row??):
  (input_grid_1 * evec_row_1_val_1) + (input_grid_2 * evec_row_1_val_2) ... (input_grid_n * evec_row_1_val_n)

Second principal component (uses second eigenvector array row??):
  (input_grid_1 * evec_row_2_val_1) + (input_grid_2 * evec_row_2_val_2) ... (input_grid_n * evec_row_2_val_n)

etc...

My question is... after generating the eigenvectors via numpy.linalg.eig, I'm not sure of their ordering. Do the values in evec[0] (see output below) represent the eigenvectors I should use to calculate the first principal component output array? Or should I start with the bottom row (i.e., evec[11] below)? Or are the eigenvectors arranged column-wise in the output from numpy.linalg.eig?

Thank you for any help, and I apologize for the length of this.

Here is some sample code I'm using, and the resulting output:

print "flattened mdata:\n", mdata2, "\n"
mdata2 -= np.mean(mdata2, axis=0)
print "mean centered data:\n", mdata2, "\n"
covar = np.cov(mdata2, rowvar=0)
print "covariance matrix:\n", covar, "\n"
eval, evec = la.eig(covar)
print "eigenvectors:\n", evec, "\n"

flattened mdata:
[[3 4 6 4 4 9 2 7 8 7 2 2]
 [1 3 7 9 3 1 8 8 2 6 7 5]
 [1 8 5 2 6 4 7 3 3 1 4 6]
 [5 8 2 4 3 7 7 7 1 4 8 2]
 [4 5 6 4 1 7 5 7 2 1 3 3]]

mean centered data:
[[ 0 -1  0  0  0  3 -3  0  4  3 -2 -1]
 [-1 -2  1  4  0 -4  2  1 -1  2  2  1]
 [-1  2  0 -2  2 -1  1 -3  0 -2  0  2]
 [ 2  2 -3  0  0  1  1  0 -2  0  3 -1]
 [ 1  0  0  0 -2  1  0  0 -1 -2 -1  0]]

covariance matrix:
[[ 1.7   0.95 -1.65 -0.6  -1.    2.   -0.3   0.6  -1.   -0.55  0.65 -1.3 ]
 [ 0.95  3.2  -1.9  -3.1   1.    1.25  0.7  -1.9  -1.5  -2.8   0.9   0.2 ]
 [-1.65 -1.9   2.3   1.2   0.   -1.75 -0.15  0.05  1.25  0.6  -1.55  1.1 ]
 [-0.6  -3.1   1.2   4.8  -1.   -3.5   1.4   2.7  -1.    2.9   1.8  -0.1 ]
 [-1.    1.    0.   -1.    2.   -1.    0.5  -1.5   0.5   0.    0.5   1.  ]
 [ 2.    1.25 -1.75 -3.5  -1.    7.   -4.25 -0.25  3.25  0.25 -3.   -2.5 ]
 [-0.3   0.7  -0.15  1.4   0.5  -4.25  3.7  -0.15 -4.   -1.8   3.15  1.45]
 [ 0.6  -1.9   0.05  2.7  -1.5  -0.25 -0.15  2.3  -0.25  2.1   0.7  -1.15]
 [-1.   -1.5   1.25 -1.    0.5   3.25 -4.   -0.25  5.5   3.   -3.75 -0.75]
 [-0.55 -2.8   0.6   2.9   0.    0.25 -1.8   2.1   3.    5.2  -0.1  -1.3 ]
 [ 0.65  0.9  -1.55  1.8   0.5  -3.    3.15  0.7  -3.75 -0.1   4.3   0.15]
 [-1.3   0.2   1.1  -0.1   1.   -2.5   1.45 -1.15 -0.75 -1.3   0.15  1.7 ]]

eigenvectors:
[[-0.05226275 +0.00000000e+00j  0.14243804 +0.00000000e+00j
   0.40669907 +0.00000000e+00j -0.08874805 +0.00000000e+00j
   0.24983235 +0.00000000e+00j -0.07788035 +0.00000000e+00j
  -0.20417938 +8.64129842e-02j -0.20417938 -8.64129842e-02j
   0.17142025 -1.53009928e-02j  0.17142025 +1.53009928e-02j
  -0.07359380 -5.42353843e-02j -0.07359380 +5.42353843e-02j]
 [ 0.01313939 +0.00000000e+00j  0.46207747 +0.00000000e+00j
   0.04053819 +0.00000000e+00j  0.24412764 +0.00000000e+00j
  -0.29669270 +0.00000000e+00j -0.11848560 +0.00000000e+00j
  -0.00129189 -1.08527235e-01j -0.00129189 +1.08527235e-01j
   0.24951441 -4.06744086e-02j  0.24951441 +4.06744086e-02j
  -0.08613229 -4.16143549e-02j -0.08613229 +4.16143549e-02j]
 [ 0.01212470 +0.00000000e+00j -0.23533069 +0.00000000e+00j
  -0.38627976 +0.00000000e+00j -0.29050062 +0.00000000e+00j
  -0.04031410 +0.00000000e+00j  0.11914628 +0.00000000e+00j
  -0.03397920 -4.04899870e-01j -0.03397920 +4.04899870e-01j
  -0.02317506 +1.11224412e-01j -0.02317506 -1.11224412e-01j
   0.16309959 +1.28363333e-02j  0.16309959 -1.28363333e-02j]
 [ 0.25764595 +0.00000000e+00j -0.49039161 +0.00000000e+00j
   0.17582472 +0.00000000e+00j -0.08694414 +0.00000000e+00j
  -0.49772683 +0.00000000e+00j  0.04915654 +0.00000000e+00j
  -0.22312499 +1.60190725e-02j -0.22312499 -1.60190725e-02j
  -0.13318403 -2.09741709e-01j -0.13318403 +2.09741709e-01j
  -0.06711105 +1.03673483e-01j -0.06711105 -1.03673483e-01j]
 [ 0.04158039 +0.00000000e+00j  0.08806516 +0.00000000e+00j
  -0.28689676 +0.00000000e+00j  0.55985795 +0.00000000e+00j
   0.16139009 +0.00000000e+00j -0.31020587 +0.00000000e+00j
   0.06748141 -1.77634044e-02j  0.06748141 +1.77634044e-02j
  -0.06824769 -3.42630421e-01j -0.06824769 +3.42630421e-01j
  -0.01037536 -1.65831604e-01j -0.01037536 +1.65831604e-01j]
 [-0.55737468 +0.00000000e+00j  0.20542936 +0.00000000e+00j
   0.32117797 +0.00000000e+00j -0.04132469 +0.00000000e+00j
  -0.38616878 +0.00000000e+00j -0.07888938 +0.00000000e+00j
  -0.18663356 -2.26138514e-01j -0.18663356 +2.26138514e-01j
  -0.48555065 +0.00000000e+00j -0.48555065 +0.00000000e+00j
  -0.05777089 +8.39094379e-02j -0.05777089 -8.39094379e-02j]
 [ 0.44645722 +0.00000000e+00j  0.09345400 +0.00000000e+00j
  -0.01958382 +0.00000000e+00j  0.00350717 +0.00000000e+00j
  -0.49307361 +0.00000000e+00j -0.19705809 +0.00000000e+00j
  -0.26393665 +1.65676777e-01j -0.26393665 -1.65676777e-01j
  -0.07097582 +1.77803457e-01j -0.07097582 -1.77803457e-01j
   0.31461793 -8.41424299e-02j  0.31461793 +8.41424299e-02j]
 [ 0.03562756 +0.00000000e+00j -0.29183500 +0.00000000e+00j
   0.34969990 +0.00000000e+00j -0.16770853 +0.00000000e+00j
   0.18499349 +0.00000000e+00j -0.27031753 +0.00000000e+00j
   0.04149287 -1.82755302e-01j  0.04149287 +1.82755302e-01j
   0.27944344 -1.37299309e-01j  0.27944344 +1.37299309e-01j
  -0.09637630 -4.39851477e-01j -0.09637630 +4.39851477e-01j]
 [-0.46659969 +0.00000000e+00j -0.23849866 +0.00000000e+00j
  -0.26877609 +0.00000000e+00j  0.24093339 +0.00000000e+00j
  -0.35050288 +0.00000000e+00j  0.44403480 +0.00000000e+00j
  -0.43610595 +0.00000000e+00j -0.43610595 +0.00000000e+00j
   0.37131095 +1.81269636e-01j  0.37131095 -1.81269636e-01j
   0.29146786 -2.22035640e-01j  0.29146786 +2.22035640e-01j]
 [-0.13939660 +0.00000000e+00j -0.51461522 +0.00000000e+00j
   0.15383834 +0.00000000e+00j  0.51033734 +0.00000000e+00j
   0.13867222 +0.00000000e+00j -0.36401539 +0.00000000e+00j
   0.19655716 +8.68943320e-02j  0.19655716 -8.68943320e-02j
  -0.16900231 +9.74521096e-02j -0.16900231 -9.74521096e-02j
  -0.20194713 +1.85123869e-01j -0.20194713 -1.85123869e-01j]
 [ 0.39116291 +0.00000000e+00j  0.04835035 +0.00000000e+00j
   0.32293099 +0.00000000e+00j  0.42167036 +0.00000000e+00j
  -0.04351453 +0.00000000e+00j  0.61425451 +0.00000000e+00j
  -0.20683758 -3.83249003e-01j -0.20683758 +3.83249003e-01j
  -0.02544963 +2.31011753e-01j -0.02544963 -2.31011753e-01j
   0.13276711 -1.26161298e-04j  0.13276711 +1.26161298e-04j]
 [ 0.16560064 +0.00000000e+00j  0.04924581 +0.00000000e+00j
  -0.38012669 +0.00000000e+00j -0.03145605 +0.00000000e+00j
   0.06088502 +0.00000000e+00j -0.20480777 +0.00000000e+00j
  -0.23192947 -1.44999255e-01j -0.23192947 +1.44999255e-01j
  -0.27861106 -9.88363770e-05j -0.27861106 +9.88363770e-05j
  -0.60558363 +0.00000000e+00j -0.60558363 +0.00000000e+00j]]
vulture
  • 731
  • 2
  • 7
  • 10

1 Answers1

3

Eigenvectors are arranged column-wise. E.g. if you get a matrix of eigenvectors [[0, 1], [-1, 0]]

your eigenvectors are V1 = [0, -1]; V2 = [1, 0].

Just try it on Fibonacci numbers:

>>> fib = numpy.array([[1, 1],[1, 0]])
>>> numpy.linalg.eig(fib)
(array([ 1.61803399, -0.61803399]), array([[ 0.85065081, -0.52573111],
       [ 0.52573111,  0.85065081]]))

The first array is eigenvalues, the next one contains eigenvectors in columns. Compare to: http://mathproofs.blogspot.com/2005/04/nth-term-of-fibonacci-sequence.html

Good luck!

Boris Burkov
  • 13,420
  • 17
  • 74
  • 109
  • 1
    Thanks...so column-wise it is. I also found a post (http://stackoverflow.com/a/8093043/1238926) where argsort() was used to index and sort the eigenvalue and eigenvector arrays. I would like to do this to keep track of the eval/evec association, but after some testing it's not clear to me after sorting how to know which eigenvector column to use for calculating the first principal component, then the second, etc. Hopefully this is just my ignorance and there is an easy pattern to follow. Any tips? – vulture Sep 12 '12 at 23:56
  • stackoverflow.com/a/8093043/1238926 - useful post, thanks. Well, the main eigenvalue has the largest module, so the corresponding vector is the first principal component. So you could just >>> from numpy import abs, linalg, array >>> absolutes = abs(array(eigenvalues)) >>> for index in reversed(argsort(absolutes)): >>> next_component = eigenvectors[index] Love to see, that someone plays with linear algebra as I do. :) – Boris Burkov Sep 13 '12 at 08:28
  • I know the magnitude of the eigenvalues determines their relative importance, but I thought both positive and negative values were considered? Are you saying eigenvalues should be converted to absolute values before sorting? The examples I've looked at (such as the one I linked to above) don't appear to run abs() on the eigenvalues prior to sorting. Thanks for the help. – vulture Sep 13 '12 at 16:33
  • Eigenvalues can be positive, negative or complex and, of course, we are interested in their absolute value (in case of complex eigenvalues it's non-trivial thing). Your matrix is symmetric, thus your eigenvalues are real-numbered and eigenvectors are orthogonal, which justifies the application of EVD :) Let's call your covariance matrix C. Arbitrary vector V can be decomposed in the basis of eigenvectors of your matrix as V = c1 * eigenvector1 + c2 * eigenvector2 + cn * eigenvectorn; C * V = c1*lambda1*eigenvector1 + ... + cn*lambda_n*eigenvector_n (where lambda_i is eigenvalue for eigenvec_i) – Boris Burkov Sep 13 '12 at 18:15
  • C^N * V = c1 * lambda1^N * eigenvector1 + ... c_n * lambda_n^N * eigenvectorN. If N = 1000, lambda1 = 1, lambda2 = 0.99, lambda1^1000 is ~100000 greater than lambda2^1000, thus coefficient for the eigenvector1 is much much greater (in absolute value) than coefficient for eigenvector2. That's the main purpose of eigenvectors and eigenvalues aside from PCA. As for PCA someone just decided, that if there are several dimensions, along which the dispersion is stretched, eigenvectors are those dimensions. Still, eigenvectors are arranged in the order of decreasing respective absolute eigenvalues. – Boris Burkov Sep 13 '12 at 18:24
  • @vulture (this is technical comment, so that you get notification) – Boris Burkov Sep 13 '12 at 18:27