-1

I am trying to train the VGG16 model code, but the loss is not optimized and seems that model's parameters are not updated. here is the model :

import torch
import torch.nn as nn
import math
import torch.nn.functional as F
from utils import AvgPoolConv
cfg = {
'VGG11': [16, 'M', 32, 'M', 64, 64, 'M', 128, 128, 'M', 128, 128, 'M'],
'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 
'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],}
 class VGG(nn.Module):
def __init__(self, vgg_name, use_bn, num_class=100):
    super(VGG, self).__init__()
    self.features = self._make_layers(cfg[vgg_name], use_bn)
    self.classifier = nn.Sequential(
        nn.Flatten(),
        nn.Linear(512,4096), 
        nn.ReLU(inplace=True),
        nn.Dropout(p=0.5),
        nn.Linear(4096,4096), 
        nn.ReLU(inplace=True),
        nn.Dropout(p=0.5),
        nn.Linear(4096, num_class)

    )
    
    #self.classifier = nn.Linear(512, num_class)

    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
            m.weight.data.normal_(0, math.sqrt(2. / n))
        elif isinstance(m, nn.BatchNorm2d):
            m.weight.data.fill_(1)
            m.bias.data.zero_()
        elif isinstance(m, nn.Linear):
            n = m.weight.size(1)
            m.weight.data.normal_(0, 1.0/float(n))
            m.bias.data.zero_()

def forward(self, x):
    out = self.features(x)
    out = self.classifier(out)
    return out

def _make_layers(self, cfg, use_bn=True):
    layers = []
    in_channels = 3
    for x in cfg:
        if x == 'M':
             layers += [nn.AvgPool2d(2)]
            #layers += [AvgPoolConv(kernel_size=2, stride=2, input_channel=in_channels)]
        else:
            layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                       nn.BatchNorm2d(x) if use_bn else nn.Dropout(0.25),
                       nn.ReLU(inplace=True)]
            in_channels = x
    #layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
    return nn.Sequential(*layers)

but if I delete the first 2 FC layers from the classifier as shown below, the model is trained and loss can be optimized ??

self.features = self._make_layers(cfg[vgg_name], use_bn)
self.classifier = nn.Linear(512, num_class)

Why this happens?

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

First, it would be good to verify if the parameters are really not updated or just that the change is small. Different architectures might require different tuning (learning rate, weight decay if you use it etc.). A good thing to try when debugging is a test "can I overfit it"; use a single batch (or a single sample even) and check if you can get it to 0; you might need to tweak optimization parameters mentioned before.

Assuming everything is correct and the gradient flows, I'd say - tune the learning rate and try adding batch normalization between your linear and relu layers (should make the training much faster).

burzan
  • 222
  • 1
  • 7
  • thank you for your reply, the parameters are really not updated. I wonder is this because I am using Colab , does colab have limitation in the number of trainable parameters, so stop training process when I add layers? Do you have any idea ? – Musheer Abdullah Jul 12 '22 at 14:58
  • There's no such limitation. – burzan Jul 13 '22 at 08:03