6

In my understanding, fully connected layer(fc in short) is used for predicting.

For example, VGG Net used 2 fc layers, which are both 4096 dimension. The last layer for softmax has dimension same with classes num:1000.

VGG net

But for resnet, it used global average pooling, and use the pooled result of last convolution layer as the input.

resnet

But they still has a fc layer! Does this layer a really fc layer? Or this layer is only to make input into a vector of features which number is classes number? Does this layer has function for prediction result?

In a word, how many fc layers do resnet and VGGnet have? Does VGGnet's 1st 2nd 3rd fc layer has different function?

David Ding
  • 680
  • 3
  • 9
  • 19

2 Answers2

3

VGG has three FC layers, two with 4096 neurons and one with 1000 neurons which outputs the class probabilities.

ResNet only has one FC layer with 1000 neurons which again outputs the class probabilities. In a NN classifier always the best choice is to use softmax, some authors make this explicit in the diagram while others do not.

Dr. Snoopy
  • 55,122
  • 7
  • 121
  • 140
  • It's misleading to call the output layer of ResNet a fully-connected layer. The output layer is a softmax layer, which unlike a fully-connected layer in VGG-16, does not have any trainable parameters. – Fijoy Vadakkumpadan Nov 25 '22 at 22:06
  • @FijoyVadakkumpadan No, it is not, you are incorrect, because a ResNet trained for ImageNet does have a FC layer, see some implementations: https://github.com/keras-team/keras/blob/069b8d3bc15dbb13b6311fee52c91d6a78985bfb/keras/applications/resnet.py#L203 – Dr. Snoopy Nov 25 '22 at 22:14
  • While the implementation you point to has a dense layer technically, there's no need to use a dense layer there. You missed what feeds into that dense layer - a Global Average Pooling layer. That dense layer can be replaced with a 1x1 convolutional layer. – Fijoy Vadakkumpadan Nov 25 '22 at 22:39
  • @FijoyVadakkumpadan Irrelevant, your main argument was that the layer would have no trainable parameters, while a 1x1 conv does have trainable parameters. – Dr. Snoopy Nov 25 '22 at 22:40
  • No, my argument was that calling a softmax layer a fully connected layer is misleading. You missed that too. – Fijoy Vadakkumpadan Nov 25 '22 at 22:41
  • @FijoyVadakkumpadan That is also incorrect, because it is a FC layer with a softmax activation, you have not shown that it is not the case. – Dr. Snoopy Nov 25 '22 at 22:42
  • No, my original comment is really about softmax when it's mentioned as a separate layer. – Fijoy Vadakkumpadan Nov 25 '22 at 22:46
  • @FijoyVadakkumpadan Maybe I should remind you that the question was "Do ResNets have FC layers", not about if the last layer was divided between FC and Softmax. – Dr. Snoopy Nov 26 '22 at 19:26
3

In essence the guys at microsoft (ResNet) favor more convolutional layers instead of fully connected ones and therefore ommit fully connected layers. GlobalAveragePooling also decreases the feature size dramatically and therefore reduces the number of parameters going from the convolutional part to the fully connected part.

I would argue that the performance difference is quite slim, but one of their main accomplishments, by introducing ResNets is the dramatic reduction of parameters and those two points helped them accomplish that.

Thomas Pinetz
  • 6,948
  • 2
  • 27
  • 46