2

VGG and AlexNet, amongst others, require a fixed image input of square dimensions (H == W). How can one fine-tune or otherwise perform net surgery such that non-square inputs can be provided?

For your reference, I'm using Caffe and intend to extract FC7 features for non-square image inputs.

stop-cran
  • 4,229
  • 2
  • 30
  • 47
E.W.
  • 267
  • 3
  • 13

1 Answers1

1

For the convolutional part of the net - the input size does not really matter: the shape of the output will change as you change the input size.
However, when it comes to "InnerProduct" layers - the shape of the weights is fixed and it is determined by input size.

You can perform "net surgery" converting your "InnerProduct" layers into "Convolution" layers: This way your net can process inputs at any size they come. However, your outputs will also vary in shape.

Another option is to define your net according to a new fixed input size, re-use all the learned weights of the covolutions and only fine-tune the weights of the fully connected layers.

user2469775
  • 447
  • 4
  • 11
  • If I convert the InnerProduct layers to Convolution layers, how to extract a 1D feature vecture? (note: my goal is to extract features for image search) – E.W. Nov 23 '15 at 04:01