Resnet training - L2 loss decreases while cross-entropy stays around 0.69

Question

I am using this https://github.com/tensorflow/models/tree/master/official/resnet official tensorflow implementation of resnet to train a binary classifier on my own dataset. I modified a little bit of the input_fn in imagenet_main.py to do my own image loading and preprocessing. But after many times of parameter tuning, I can't make my model train properly. I can only find a set of parameters that let training accuracy increase reaching 100%, while the validation accuracy stay around 50% forever. The implementation uses piece-wise learning-rate. I tried initial learning rate from 0.1 to 1e-5 and weight decay from 1e-2 to 1e-5, and no convergence on validation set was found.

A suspicious observation is that during training, the l2 loss decrease slowly and steady while cross-entropy is very reluctant to decrease, staying around 0.69.

Any idea about what can I try further ?

Regarding my dataset and image preprocessing, The training data set is around 100K images. The validation set is around 10K. I just resize each image to 224*224 while keeping aspect ration and subtract 127 on each channel and divide them by 255.

@SohaibAnwaar . Thanks for asking. I'm running it on my own image collection. The training data set is around 100K images. The validation set is around 10K. Regarding preprocessing, I just resize each image to 224*224 while keeping aspect ration and subtract 127 on each channel and divide them by 255. — Hua, Jun 26 '19 at 08:33
The dataset is quite balanced, I think. Each class has around 50K. — Hua, Jun 26 '19 at 08:35
have you applied any image-processing techniques ? and in binary classification we always keep 1 class sample greater than the other class. Because if our model learn to predict one class accurately than it will be able to predict the other class to. Like if else. if(class ) else: like this. and try pre-trained model on your dataset if it is similar to image-net — Sohaib Anwaar, Jun 26 '19 at 09:11
@SohaibAnwaar Except the resizing and normalization I mentioned above, I did nothing else image processing. I haven't tried pretrained-model in tensorflow. But I tried pytorch version of resnet on my dataset and it worked, both for from scratch and from pretrained. I'm just curious why this tensorflow version didn't work. — Hua, Jun 26 '19 at 10:02
Actually @Hua resnet have so many trainable parameters and it is trained on image net which has 1k classes. and your data-set has only two classes. Dense layers of resnet has 4k neurons which in result increase the number of trainable parameter. Now number of parameters are directly related to risk of over-fitting. Means that resnet model is not suitable for your data kindly make some changes to resnet. Try to decrease number of parameter. That may help — Sohaib Anwaar, Jun 26 '19 at 10:20
@SohaibAnwaar I see, thanks for your advice:-) I'll try reduce the parameter in the final dense layer. — Hua, Jun 26 '19 at 10:26
yaa try to reduce over all parameters of your architecture! and kindly mark tick if helpful — Sohaib Anwaar, Jun 26 '19 at 10:29

score 0 · Accepted Answer · answered Jun 26 '19 at 10:29

Actually @Hua resnet have so many trainable parameters and it is trained on image net which has 1k classes. and your data-set has only two classes. Dense layers of resnet has 4k neurons which in result increase the number of trainable parameter. Now number of parameters are directly related to risk of over-fitting. Means that resnet model is not suitable for your data kindly make some changes to resnet. Try to decrease number of parameter. That may help –

Resnet training - L2 loss decreases while cross-entropy stays around 0.69

1 Answers1