Training Deep Convnet with small input size

Question

I am very new to this field of Deep Learning. While I understand how it works and I managed to run some tutorials on Caffe Library I still have some questions which I was unable to find some satisfying answers.

My questions are as follows:

Consider AlexNet which takes 227 x 227 image size as input in caffe (I think in original paper its 224), and the FC7 produces as 4096-D feature vector. Now if I want to detect a person say using Sliding window of Size (32 x 64) then each of this window will be upsized to 227 x 227 before going through the AlexNet. This is some big computation. Is there a better way to handle this (32 x 64) window?
My approach to this 32 x 64 window detector is to build my own network with few convolutions, pooling, ReLus and FCs. While I understand how I can build the architecture, I am afraid that the model I will train might have issues such as overfitting etc. One of my friend told me to pretrain my network using AlexNet but I don't know how to do this? I cannot get hold of him to ask for now but anyone there who thinks that what he said is doable? I am confused. I was thinking to use ImageNet and train my network which will take 32 x 64 input. Since this is just feature extractor I feel that using the imageNet might provide me with all variety of images for good learning? Please correct me if I am wrong and if possible guide me into the correct path.
This question is just about Caffe. Say I compute feature using HOG and I want to use the GPU version of Neural Net to train a classifier. Is that possible? I want thinking to use HDF5 layer to read the hog feature vector and pass that fully connected layer for training? Is that possible?

I would appreciate any help or links to papers etc that may help me understand the idea of Convnets.

score 4 · Accepted Answer · edited May 23 '17 at 12:32

4

For a CNN which contains fully connected layers, the input size cannot be changed. If the network is trained on 224x224 images, then the input size has to be 224x224. Look at this question.
Training your own network from scratch will require huge amount of data. AlexNet was trained on a million images. If you have such high amount of training data (you can download the ImageNet training data) then go ahead. Otherwise you might want to look into finetuning.
Yes, you can use HDF5 layer to read the HOG feature vector for training.

edited May 23 '17 at 12:32

Community

1
1

answered Jul 12 '16 at 19:06

malreddysid

1,342
1
9
12

Could you please explain bit about fine tuning in caffe. I find it little difficult to understand what it actually does. Is it that i can pretrain my smaller network using alexnet then finetune it ? – kcc__ Jul 13 '16 at 03:37
1

In most cases, the intial layers of CNNs like alexnet, vgg, etc. all do the same thing. They learn abstract features from the network. What they do with the features then is different. So while finetuning, you can reuse the initial layers of the network and your network will converge faster. – malreddysid Jul 13 '16 at 04:00
Thanks for your help. I am still confused with this fine-tuning. I tried the example in Caffe on flickr_styles however my confusion is that if I use AlexNet on the backend for feature extraction and put my network on the frontend, that means at test time I will have to resize the 32 x 64 image path to (227 x 227) for detection? is that so? Or is it Iike learn the initial weights from AlexNet then can use that weights to train my custom network for human detection which will only take 32 x 64 input. – kcc__ Jul 13 '16 at 04:52
1

If you are taking on the convolutional layers from AlexNet then you won't need to resize. You can take the weights as is. – malreddysid Jul 13 '16 at 05:35
So in my my_model.prototxt file I should include the alexnet.prototxt except the FC8 and then add my model after FC7. Or I should just do like in Fine Tuning example where I like the weight fillers to AlexNet while my_model.prototxt only contains my model. Sorry if I sound confusing just getting my head around this. Thanks – kcc__ Jul 13 '16 at 07:52
You should do the first one. – malreddysid Jul 13 '16 at 09:35
I tried what you said. For learning, I just changed the crop_size in flickr_styles in caffe to 64 x 64 while all other params are same. When I run the net I get this error. Cannot copy param 0 weights from layer 'fc6'; shape mismatch. Source param shape is 1 1 4096 9216 (37748736); target param shape is 4096 256 (1048576). I cannot find what is wrong here? – kcc__ Jul 13 '16 at 11:08
You cannot finetune fully connected layers if you change the file size. – malreddysid Jul 13 '16 at 13:53

Training Deep Convnet with small input size

1 Answers1