I've been running the EfficientNet code from Google on my own image datasets and have run into the following problem. For each variant of the architecture (b0 to b7) the training and validation loss decrease up until +/- 100 epochs. After which both start to increase rapidly whilst the validation accuracy does the inverse.
I've not seen this pattern anywhere before. My suspicion is that it is because of overfitting but then wouldn't the training loss continue to decrease?
Looking at other SO questions, this one comes closes to what I mean but I'm not sure. If this is a vanishing gradient problem then how come the folks at Google didn't experience it with ImageNet data?
Setup
This has been run using the EfficientNet tutorial. My dataset consists of 41k images for train and 5k images for validation.