Error - Calculating Euclidean distance for PCA in python

Question

I am trying to implement face recognition by Principal Component Analysis (PCA) using python. I am following the steps in this tutorial: http://onionesquereality.wordpress.com/2009/02/11/face-recognition-using-eigenfaces-and-distance-classifiers-a-tutorial/

Here is my code:

import os
from PIL import Image
import numpy as np
import glob
import numpy.linalg as linalg


#Step1: put database images into a 2D array
filenames = glob.glob('C:\\Users\\Karim\\Downloads\\att_faces\\New folder/*.pgm')
filenames.sort()
img = [Image.open(fn).convert('L').resize((90, 90)) for fn in filenames]
images = np.asarray([np.array(im).flatten() for im in img])


#Step 2: find the mean image and the mean-shifted input images
mean_image = images.mean(axis=0)
shifted_images = images - mean_image


#Step 3: Covariance
c = np.cov(shifted_images)


#Step 4: Sorted eigenvalues and eigenvectors
eigenvalues,eigenvectors = linalg.eig(c)
idx = np.argsort(-eigenvalues)
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]


#Step 5: Only keep the top 'num_eigenfaces' eigenvectors
num_components = 20
eigenvalues = eigenvalues[0:num_components].copy()
eigenvectors = eigenvectors[:, 0:num_components].copy()


#Step 6: Finding weights
w = eigenvectors.T * np.asmatrix(shifted_images)


#Step 7: Input image
input_image = Image.open('C:\\Users\\Karim\\Downloads\\att_faces\\1.pgm').convert('L').resize((90, 90))
input_image = np.asarray(input_image).flatten()


#Step 8: get the normalized image, covariance, eigenvalues and eigenvectors for input image
shifted_in = input_image - mean_image
c = np.cov(input_image)
cmat = c.reshape(1,1)
eigenvalues_in, eigenvectors_in = linalg.eig(cmat)


#Step 9: Fing weights of input image
w_in = eigenvectors_in.T * np.asmatrix(shifted_in)
print w_in
print w_in.shape

#Step 10: Euclidean distance
d = np.sqrt(np.sum((w - w_in)**2))
idx = np.argmin(d)
match = images[idx]

I am havin a problem in Step 10 as I am getting this error: Traceback (most recent call last): File "C:/Users/Karim/Desktop/Bachelor 2/New folder/new3.py", line 59, in <module> d = np.sqrt(np.sum((w - w_in)**2)) File "C:\Python27\lib\site-packages\numpy\matrixlib\defmatrix.py", line 343, in __pow__ return matrix_power(self, other) File "C:\Python27\lib\site-packages\numpy\matrixlib\defmatrix.py", line 160, in matrix_power raise ValueError("input must be a square array") ValueError: input must be a square array

Anyone can help??

I see that you decided to take the eigen's from the 1x1 covariance matrix. You should probably make sure that's really what you want to do. Covariance is finding how correlated two or more sets of data are, and when you ran on your training images, you found how well correlated they were to each other. When you run it on the input image, you are getting its self-correlation value, which might not be what you want. I'd have to look more closely at the tutorial and think about it more, but I wanted to warn you so that I don't mislead you with my previous answer. — askewchan, Apr 16 '13 at 16:04
@askewchan Thanks for your advice. I am wondering that I might be able to get my job done using class `PCA` built in the `matplotlib`.. Do you have any idea how it works? — user2229953, Apr 16 '13 at 16:40
No, unfortunately I have no experience with it, but if you try to learn it, you can always post your questions here :) — askewchan, Apr 16 '13 at 17:17

score 3 · Accepted Answer · answered Apr 16 '13 at 03:33

I think you want to change the line where you calculate d to something like this:

#Step 10: Euclidean distance
d = np.sqrt(np.sum(np.asarray(w - w_in)**2, axis=1)

This gives you a list of length M (number of training images) of the squared, summed, rooted distances between each images pixels. I believe that you don't want the matrix product, you want the elementwise square of each value, hence the np.asarray to make it not a matrix. This gives you the 'euclidean' difference between w_in and each of the w matrices.

score 1 · Answer 2 · answered Apr 15 '13 at 23:49

1

When you go (w - w_in), the result is not a square matrix. To multiply a matrix by itself it must be square (that's just a property of matrix multiplication). So either you've constructed your w and w_in matrices wrong, or what you actually meant to do is square each element in the matrix (w - w_in) which is a different operation. Search for element-wise multiplication to find the numpy syntax.

answered Apr 15 '13 at 23:49

DaveTheScientist

3,299
25
19

1

What I want to square is `(W - W_in)` not each element in the matrix so maybe the problem is in constructing `W` and `W_in` – user2229953 Apr 16 '13 at 00:01

Error - Calculating Euclidean distance for PCA in python

2 Answers2