0

When training deep CNN, a common way is to use SGD with momentum with a "step" learning rate policy (e.g. learning rate set to be 0.1,0.01,0.001.. at different stages of training).But I encounter an unexpected phenomenon when training with this strategy under MXNet.

That is the periodic training loss value https://user-images.githubusercontent.com/26757001/31327825-356401b6-ad04-11e7-9aeb-3f690bc50df2.png

The above is the training loss at a fixed learning rate 0.01, where the loss is decreasing normally https://user-images.githubusercontent.com/26757001/31327872-8093c3c4-ad04-11e7-8fbd-327b3916b278.png

However, at the second stage of training (with lr 0.001) , the loss goes up and down periodically, and the period is exactly an epoch

So I thought it might be the problem of data shuffling, but it cannot explain why it doesn't happen in the first stage. Actually I used ImageRecordIter as the DataIter and reset it after every epoch, is there anything I missed or set mistakenly?

train_iter = mx.io.ImageRecordIter(
    path_imgrec=recPath,
    data_shape=dataShape,
    batch_size=batchSize,
    last_batch_handle='discard',
    shuffle=True,
    rand_crop=True,
    rand_mirror=True)

The codes for training and loss evaluation:

while True:
    train_iter.reset()
    for i,databatch in enumerate(train_iter):
                globalIter += 1
        mod.forward(databatch,is_train=True)
        mod.update_metric(metric,databatch.label)
        if globalIter % 100 == 0:
                    loss = metric.get()[1]
                    metric.reset()
                mod.backward()
                mod.update()

Actually the loss can converge, but it takes too long. I've suffered from this problem for a long period of time, on different network and different datasets. I didn't have this problem when using Caffe. Is this due to the implementation difference?

lebelinoz
  • 4,890
  • 10
  • 33
  • 56

1 Answers1

0

Your loss/learning curves look suspiciously smooth, and I believe you can observe the same oscillation in the loss even when the learning rate is set to 0.01 just at a smaller relative scale (i.e. if you 'zoomed in' to the chart you'd see the same pattern). You may have an issue with your data iterator passing the same batch for example. And your training loop looks wrong but this could be due to formatting (e.g. mod.update() only performed every 100 batches isn't correct).

You can observe periodicity in your loss when you're traveling across a valley in the loss surface, up and down the sides rather than down the valley. Choosing a lower learning rate can help fix this, and make sure you are using momentum too.

Thom Lane
  • 993
  • 9
  • 9