I have a custom dataset of approximately 20k images (10% used of validation). I have roughly 1/3 in label class 0, 1/3 in label class 1, and 1/3 that do not have class 0, or 1 objects with a -1 label.
I have run approximately 400 epochs, the last 40 epochs validation mAP has increased from 0.817 TO 0.831, and training cross entropy loss from 0.377->0.356
the last epoch had validation mAP <score>=(0.83138943309)
train cross_entropy <loss>=(0.356147519184)
train smooth_l1 <loss>=(0.150637295831)
The training loss still seems like its got a reasonable amount to reduce but I don't have any experience with resnet (on yolov3 this data set quickly went below .1)
Is my approach of have 1/3 of the training images not have either class present reasonable? When I was doing yolov3 training it seemed to help the network avoid false positives.
Is there any rule of thumb that helps me estimate how many epochs are appropriate based on the number of classes/images?
Its cost me about 100 bucks on aws to get to this point, I'm not sure if it needs another 100 bucks or 1000 bucks to get to the optimal mAP - at the current rate it appears 1 hour is making about 1% improvement - and i'd expect that to slow down.
Are there other metrics I should be looking at? (if so how do i export them)?
are there any hyperparameters I should change, and resume training?
My hyperparameters are:
base_network='resnet-50',
num_classes=2,
mini_batch_size=32,
epochs=200,
learning_rate=0.001,
lr_scheduler_step='3,6',
lr_scheduler_factor=0.1,
optimizer='sgd',
momentum=0.9,
weight_decay=0.0005,
overlap_threshold=0.5,
nms_threshold=0.45,
image_shape=416,
label_width=480,
num_training_samples=19732)
thanks, John