PCA on Sift desciptors and Fisher Vectors

Question

I was reading this particular paper http://www.robots.ox.ac.uk/~vgg/publications/2011/Chatfield11/chatfield11.pdf and I find the Fisher Vector with GMM vocabulary approach very interesting and I would like to test it myself.

However, it is totally unclear (to me) how do they apply PCA dimensionality reduction on the data. I mean, do they calculate Feature Space and once it is calculated they perform PCA on it? Or do they just perform PCA on every image after SIFT is calculated and then they create feature space?

Is this supposed to be done for both training test sets? To me it's an 'obviously yes' answer, however it is not clear.

I was thinking of creating the feature space from training set and then run PCA on it. Then, I could use that PCA coefficient from training set to reduce each image's sift descriptor that is going to be encoded into Fisher Vector for later classification, whether it is a test or a train image.

EDIT 1;

Simplistic example:

[coef , reduced_feat_space]= pca(Feat_Space','NumComponents', 80);

and then (for both test and train images)

reduced_test_img = test_img * coef; (And then choose the first 80 dimensions of the reduced_test_img)

What do you think? Cheers

score 1 · Answer 1 · answered Feb 05 '15 at 18:16

It looks to me like they do SIFT first and then do PCA. the article states in section 2.1 "The local descriptors are fixed in all experiments to be SIFT descriptors..."

also in the introduction section "the following three steps:(i) extraction of local image features (e.g., SIFT descriptors), (ii) encoding of the local features in an image descriptor (e.g., a histogram of the quantized local features), and (iii) classification ... Recently several authors have focused on improving the second component" so it looks to me that the dimensionality reduction occurs after SIFT and the paper is simply talking about a few different methods of doing this, and the performance of each

I would also guess (as you did) that you would have to run it on both sets of images. Otherwise your would be using two different metrics to classify the images it really is like comparing apples to oranges. Comparing a reduced dimensional representation to the full one (even for the same exact image) will show some variation. In fact that is the whole premise of PCA, you are giving up some smaller features (usually) for computational efficiency. The real question with PCA or any dimensionality reduction algorithm is how much information can I give up and still reliably classify/segment different data sets

And as a last point, you would have to treat both images the same way, because your end goal is to use the Fisher Feature Vector for classification as either test or training. Now imagine you decided training images dont get PCA and test images do. Now I give you some image X, what would you do with it? How could you treat one set of images differently from another BEFORE you've classified them? Using the same technique on both sets means you'd process my image X then decide where to put it.

Anyway, I hope that helped and wasn't to rant-like. Good Luck :-)

PCA on Sift desciptors and Fisher Vectors

1 Answers1