1

I am training a classifier using CNNs in Pytorch. My classifier has 6 labels. There are 700 training images for each label and 10 validation images for each label. The batch size is 10 and the learning rate is 0.000001. Each class has 16.7% of the whole dataset images. I have trained 60 epochs and the architecture has 3 main layers:

  1. Conv2D->ReLU->BatchNorm2D->MaxPool2D>Dropout2D
  2. Conv2D->ReLU->BatchNorm2D->Flattening->Dropout2D
  3. Linear->ReLU->BatchNorm1D->Dropout And finally a fully connected and a softmax. My optimizer is AdamW and the loss function is crossentropy. The network is training well as the training accuracy is increasing but the validation accuracy remains almost fixed and equal as the chance of each class(1/number of classes). The accuracy is shown in the image below:

Accuracy of training and test

And the loss is shown in:

Loss for training and validation

Is there any idea why this is happening?How can I improve the validation accuracy? I have used L1 and L2 Regularization as well and also the Dropout Layers. I have also tried adding more data but these didn't help.

gazelle
  • 31
  • 2

2 Answers2

2

Problem Solved: First, I looked at this problem as overfitting and spend so much time on methods to solve this such as regularization and augmentation. Finally, after trying different methods, I couldn't improve the validation accuracy. Thus, I went through the data. I found a bug in my data preparation which was resulting in similar tensors being generated under different labels. I generated the correct data and the problem was solved to some extent (The validation accuracy increased around 60%). Then finally I improved the validation accuracy to 90% by adding more "conv2d + maxpool" layers.

gazelle
  • 31
  • 2
0

This is not so much a programming related question so maybe ask it again in cross-validated and it would be easier if you would post your architecture code. But here are things that I would suggest:

  • you wrote that you "tried adding more data", if you can, always use all data you have. If thats still not enough (and even if it is) use augmentation (e.g. flip, crop, add noise to the image)
  • your learning rate should not be so small, start with 0.001 and decay while training or try ~ 0.0001 without decaying
  • remove the dropout after the conv layers and the batchnorm after the dense layers and see if that helps, it is not so common to use cropout after conv but normally that shouldnt have a negative effect. try it anyways
Theodor Peifer
  • 3,097
  • 4
  • 17
  • 30