Transfer Learning and linear classifier

Question

In cs231n handout here, it says

New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns... Hence, the best idea might be to train a linear classifier on the CNN codes.

I'm not sure what linear classifier means. Does the linear classifier refer to the last fully connected layer? (For example, in Alexnet, there are three fully connected layers. Does the linear classifier the last fully connected layer?)

score 3 · Accepted Answer · answered May 05 '17 at 08:02

3

Usually when people say "linear classifier" they refer to Linear SVM (support vector machine). A linear classifier learns a weight vecotr w and a threshold (aka "bias") b such that for each example x the sign of

<w, x> + b

is positive for the "positive" class and negative for the "negative" class.

The last (usually fully connected) layer of a neural-net can be considered as a form of a linear classifier.

answered May 05 '17 at 08:02

Shai

111,146
38
238
371

1

seriously, you have been a life saver. I can't thank you enough. Here's a follow up question, is it recommended to only train the last fully connected layer for transfer learning or is training multiple fc layers also okay even though more than one fc layer will be polynomial not linear? – MoneyBall May 05 '17 at 11:47
@MoneyBall if you do not have enough training samples it's not recommended to train more than the topmost fc layer. BTW, two linear layers stacked together are NOT polynomial. Write down the math and you'll see. – Shai May 05 '17 at 11:50
how many training samples is considered enough? I have roughly 50,000 training samples. – MoneyBall May 07 '17 at 03:12
@MoneyBall how many parameters do you have in the fully connected layer? – Shai May 07 '17 at 03:37
4096 weights I believe(I'm using AlexNet and GoogLeNet at the moment). For GoogLeNet, I guess it would be 3*4096 since there are three fc layers – MoneyBall May 07 '17 at 03:39
@MoneyBall these are quite large layers. I wouldn't go deeper than one or two layers with this number of training samples – Shai May 07 '17 at 11:39
@ Shai I see. When I only train the final fully-connected layer starting from pre-trained weights from ImageNet, it results in about 76% accuracy on top1 and 95% accuracy for top5(this is classifying 50 labels). I was expecting a lot better top1 accuracy since there are only 50 labels. Since my top1 wasn't so great, I thought maybe training another fully connected layer would do the trick. – MoneyBall May 07 '17 at 11:45
@MoneyBall you may try and see – Shai May 07 '17 at 11:49
@ Shai Haha. Yes I can always do that. I think I'm struggling here a little bit because I can't quite find a systematic way to get a better accuracy. – MoneyBall May 07 '17 at 11:50
@MoneyBall the funny thing about the FC layers in that they have TONS of free parameters (in VGG, for instance, there are ~16M free parameters in one FC layer!) on the other hand it seems like most of these parameters have little impact on the model. Thus it is very dangerous to train these layers: you can easily overfit if you are not careful, but it is hard to move them to the right place. You may try [using SVD trick](http://stackoverflow.com/q/40480827/1714410) to reduce the number of parameters in the FC layers and then to fine-tune the reduced model – Shai May 07 '17 at 12:07
@ Shai Hmm. So while I was thinking about increasing the number of weights by training an additional FC layer, you're saying I should rather reduce the existing final FC layer's weights by using SVD trick. This is counter-intuitive but then again, my intuition on CNN comes from very limited experience. I will try the SVD trick. Again, thank you so much for all the help! – MoneyBall May 07 '17 at 12:12

Transfer Learning and linear classifier

1 Answers1