7

When using Python 2.7.5 with OpenCV (OSX), I run PCA on a sequence of images (cols are pixels, rows are frames as per this answer.

How do I get the eigenvalues corresponding to the eigenvectors? Looks like it's a property of the PCA object in C++, but the Python equivalent PCACompute() is a simple function.

Seems strange to omit such a key part of PCA.

Community
  • 1
  • 1
benxyzzy
  • 315
  • 3
  • 13
  • I did try your code and get similar results on some image data, but the difference between the eigenvectors for both methods is in the order of 1e-9, besides those last ones for which I get pretty low eigenvalues. Could be numerical precision... – Andrzej Pronobis May 24 '15 at 06:18
  • I'm inclined to agree - this question was posted before I was familiar with numerical computation and matrix decomposition. – benxyzzy May 24 '15 at 09:56

1 Answers1

2

matmul.cpp confirms PCA::Operator() is being used by PCACompute(), but the eigenvalues are discarded. So I did this:

# The following mimics PCA::operator() implementation from OpenCV's
# matmul.cpp() which is wrapped by Python cv2.PCACompute(). We can't
# use PCACompute() though as it discards the eigenvalues.

# Scrambled is faster for nVariables >> nObservations. Bitmask is 0 and
# therefore default / redundant, but included to abide by online docs.
covar, mean = cv2.calcCovarMatrix(PCAInput, cv2.cv.CV_COVAR_SCALE |
                                            cv2.cv.CV_COVAR_ROWS  |
                                            cv2.cv.CV_COVAR_SCRAMBLED)

eVal, eVec = cv2.eigen(covar, computeEigenvectors=True)[1:]

# Conversion + normalisation required due to 'scrambled' mode
eVec = cv2.gemm(eVec, PCAInput - mean, 1, None, 0)
# apply_along_axis() slices 1D rows, but normalize() returns 4x1 vectors
eVec = numpy.apply_along_axis(lambda n: cv2.normalize(n).flat, 1, eVec)

(simplifying assumptions: rows = observations, cols = variables; and there's many more variables than observations. Both are true in my case.)

This pretty much works. In the following, old_eVec is the result from cv2.PCACompute():

In [101]: eVec
Out[101]: 
array([[  3.69396088e-05,   1.66745325e-05,   4.97117583e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -7.23531536e-06,  -3.07411122e-06,  -9.58259793e-06, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  1.01496237e-05,   4.60048715e-06,   1.33919606e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       ..., 
       [ -1.42024751e-04,   5.21386198e-05,   3.59923394e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -5.28685812e-05,   8.50139472e-05,  -3.13278542e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  2.96546917e-04,   1.23437674e-04,   4.98598461e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00]])

In [102]: old_eVec
Out[102]: 
array([[  3.69395821e-05,   1.66745194e-05,   4.97117981e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -7.23533140e-06,  -3.07411415e-06,  -9.58260534e-06, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  1.01496662e-05,   4.60050160e-06,   1.33920075e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       ..., 
       [ -1.42029530e-04,   5.21366564e-05,   3.60067672e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -5.29163444e-05,   8.50261567e-05,  -3.13150231e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -7.13724992e-04,  -8.52700090e-04,   1.57953508e-03, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00]], dtype=float32)

There's some kind of loss of precision, visible toward the end of the outputs (though in fact quick plotting of absolute difference reveals no pattern to the imprecision).

57% of elements have a nonzero absolute difference.
Of these, 95% are different by less than 2e-16 and the mean A.D. is 5.3e-4 - however, the A.D. can be as high as 0.059, which is a lot when you consider all eigenvector values lie between -0.048 to 0.045 .

There is code in PCA::Operator() which converts to the biggest ctype; on the other hand old_eVec was float32 compared with my own code producing float64. It's worth mentioning that when compiling numpy I got some precision-related errors.

Overall, the loss of precision seems to be related to low eigenvalued eigenvectors which again points to rounding error etc. The above implementation produces results similar to PCACompute(), having duplicated the behaviour.

benxyzzy
  • 315
  • 3
  • 13
  • Great Answer :) however I found some relevant updates that your code needs. Flags passed to `calcCovarMatrix()` should be now of the form `cv2.COVAR_NORMAL` for example (dropping the cv and CV_). Also, the method `eigen()` does not longer receive that `computeEigenvectors` parameter; in fact it worked for me without adding any parameter but the covar matrix. With those updates I was able to achieve this, thank you. – DarkCygnus Jan 17 '18 at 22:11