Fine-tune VGG or AlexNet for non-square inputs

Question

VGG and AlexNet, amongst others, require a fixed image input of square dimensions (H == W). How can one fine-tune or otherwise perform net surgery such that non-square inputs can be provided?

For your reference, I'm using Caffe and intend to extract FC7 features for non-square image inputs.

score 1 · Answer 1 · answered Nov 22 '15 at 07:15

For the convolutional part of the net - the input size does not really matter: the shape of the output will change as you change the input size.
However, when it comes to "InnerProduct" layers - the shape of the weights is fixed and it is determined by input size.

You can perform "net surgery" converting your "InnerProduct" layers into "Convolution" layers: This way your net can process inputs at any size they come. However, your outputs will also vary in shape.

Another option is to define your net according to a new fixed input size, re-use all the learned weights of the covolutions and only fine-tune the weights of the fully connected layers.

If I convert the InnerProduct layers to Convolution layers, how to extract a 1D feature vecture? (note: my goal is to extract features for image search) — E.W., Nov 23 '15 at 04:01

Fine-tune VGG or AlexNet for non-square inputs

1 Answers1