2

I have a query regarding the extraction of VGG16/VGG19 features for my experiments.

The pre-trained VGG16 and VGG19 models have been trained on ImageNet dataset having 1000 classes (say c1,c2, ... c1000) and normally we extract the features from first and second fully connected layers designated ('FC1' and 'FC2'); these 4096 dimensional feature vectors are then used for computer vision tasks.

My question is that can we use these networks to extract features of an image that does not belong to any of the above 1000 classes ? In other words, can we use these networks to extract features of an image with label c1001 ? Remember that c1001 does not belong to the Imagenet classes on which these networks were initially trained on.

In the article available on https://www.pyimagesearch.com/2019/05/20/transfer-learning-with-keras-and-deep-learning/, I am quoting the following -

When performing feature extraction, we treat the pre-trained network as an arbitrary feature extractor, allowing the input image to propagate forward, stopping at pre-specified layer, and taking the outputs of that layer as our features

From the above text, there is no restriction to whether the image must necessarily belong to one of the Imagenet classes.

Kindly spare some time to uncover this mystery.

In the research papers, the authors simply state that they have used features extracted from VGG16/VGG19 network pre-trained on Imagenet dataset without giving any further details.

I am giving a case study for reference:

Animal with Attribute dataset (see https://cvml.ist.ac.at/AwA2/) is a very popular dataset with 50 animal classes for image recognition task. The authors have extracted ILSVRC-pretrained ResNet101 features for the above dataset images. This ResNet 101 network has been pre-trained on 1000 imagenet classes (different imagenet classes are available at https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a#file-imagenet1000_clsidx_to_labels-txt).

Also, the AWA classes are put as follows:

antelope,  grizzly+bear, killer+whale, beaver, dalmatian, persian+cat, horse
german+shepherd, blue+whale, siamese+cat, skunk, mole, tiger, hippopotamus, leopard, moose, spider+monkey, humpback+whale, elephant, gorilla, ox, fox, sheep
seal, chimpanzee, hamster, squirrel, rhinoceros, rabbit, bat, giraffe, wolf, chihuahua, rat, weasel, otter, buffalo, zebra, giant+panda, deer, bobcat, pig, lion, mouse, polar+bear, collie, walrus, raccoon, cow, dolphin

Now, if we compare the classes in the dataset with 1000 Imagenet classes, we find that classes like dolphin, cow, racoon, bobcat, bat, seal, sheep, horse, grizzly bear, giraffe etc are not there in the Imagenet and still the authors went on with extracting ResNet101 features. I believe that the features extracted are generalizable and that is why authors consider these features as meaningful representations for the AWA images.

Your take on this ?

The idea is to get the representations for the images not belonging to ImageNet classes and use them along with their labels in some other classifier.

Upendra01
  • 344
  • 3
  • 5
  • 15

3 Answers3

3

Yes, you can, but.

Features in first fully-connected layers suppose to encode very general patterns, like angles, lines, and simple shapes. You can assume those can be generalized outside the class set it was trained on.

There is one But, however - those features were found as to minimize error on that particular classification task with 1000 classes. It means, that there can be no guarantee that they are helpful for classifying arbitrary class.

ptyshevs
  • 1,602
  • 11
  • 26
1

For only extracting the features, you can input any image you want in your pretrained VGG/other CNN. However, for the purpose of training, you have to implement other steps as stated below.

The features that are extracted have been determined by means of exclusively training on those 1000 classes and belong to those 1000 classes. You can use your network to predict on images that do not belong to those 1000 classes, but in the paragraphs below I explain why this is not the desired approach.

The key point to outline here is that, the set features that were extracted can be used to detect/determine the presence of other objects within a photo, but not "ready"/"out of the box".

For example, edges and lines are features that are not related exclusively to those 1000 classes, but also to other ones, hence they are useful, general features.

Therefore, you can employ "transfer learning", to train on your own images (dataset), for example c1001, c1002, c1003.

Notice however that you need to train on your own set before you can use the network to predict on your new images(new classes). Transfer learning refers to using the set of already gathered/learned features, which can be suitable to apply on another problem, but you need to train on your "new problem", say c1001, c1002, c1003.

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59
  • So, assuming that the extracted features represent "edges and lines are features that are not related exclusively to those 1000 classes", can we use these extracted features as a good estimate of image representations for classes c1001 and above – Upendra01 Aug 26 '20 at 06:33
  • You can estimate it but your estimation would not be a correct one, since due to the nature of the softmax function which is used when training, the network will always assign your new image to one class that belongs to those 1000. You do not want that, you want your network to use the features that it has learned on those 1000 classes, to use them to correctly predict your new c1001 ,c1002, c1003. But, for this to happen, you need to train on your new classes and use the concept of transfer learning, as explained beautifully by Adrian Rosebrock. – Timbus Calin Aug 26 '20 at 06:36
  • For example, if I train my network on birds, cats, and dogs, it learns basic features such as shapes, edges, lines etc. I cannot use that network to classify living room chairs, but I can transfer learn the features that it has learned when it previously trained on birds, cat and dogs and retrain on "living_room_chairs and sofa" classes. – Timbus Calin Aug 26 '20 at 06:37
  • It is important to understand here that retraining must take place in order to correctly predict on your new classes. – Timbus Calin Aug 26 '20 at 06:38
  • Dear @Timbus Calin , one final doubt. actually, Please see the revised question – Upendra01 Aug 26 '20 at 07:10
  • Yes, then the answer is Yes; in the end, you can extract features from every photo you would like! – Timbus Calin Aug 26 '20 at 07:12
  • check once my case study that I have put in the question. I really appreciate your time. Thanks in advance – Upendra01 Aug 26 '20 at 07:31
  • Yes, your explanation is good. That is why they extract the features, since they are generalisable, and they can be used for detecting other objects/classes, by means of transfer learning (training on the new classes) – Timbus Calin Aug 26 '20 at 08:06
  • As a token of appreciation for the additional explanations, would you mind accepting my answer as the solving one. This is, of course, if my explanations helped you understand better your problem.. – Timbus Calin Aug 26 '20 at 08:06
  • One last comment, once I have extracted those features, can I perform feature pre-processing techniques like feature normalization, dimensionality reduction etc. on these features ? Actually the feature matrix that I have extracted has very big numbers and when this matrix is being used in gradient descent algorithms, gradients and loss function (which is of course a function of feature matrix) are exploding; I normalized the extracted features in the range [0,1] and then the gradients as well as the loss function were manageable. Just wondering if my approach is correct – Upendra01 Aug 27 '20 at 06:18
  • Normally the features are preprocessed before an image is fed to the neural network, not after. – Timbus Calin Aug 27 '20 at 06:38
  • In my setting, I have images in the form of a pixel matrix; this pixel matrix is normalized to mean 0 and std = 1 using pre_preprocess utility of VGG19 library in keras. The normalized pixel matrix is then fed to the VGG19 network for feature extraction to get (n_samples x 4096) dimensional feature matrix. The extracted VGG19 features are having big numbers. Can we normalize them again ? – Upendra01 Aug 27 '20 at 07:25
  • Yes, you can normalise them, if you just intend to "put them aside" or reuse them for other goals. – Timbus Calin Aug 27 '20 at 07:28
1

For Image classification you may need to fine tune the model using relevant classes for c1001 class label. But if you are planning to use it for unsupervised learning and using it for feature extraction part only, then there is no need to retrain the model. You can use existing pre-trained weights from ImageNet and extract feature then using that weights as VGG16/19 will generalize lower level feature in its initial layers and last few layers are only used for classification purpose. So basically pretrained model can be used for unsupervised and feature extraction purpose without retraining.

Parth Shah
  • 21
  • 3