6

I am finetuning using Caffe on an image dataset on a Tesla K40. Using a batch size=47, solver_type=SGD, base_lr=0.001, lr_policy="step", momentum=0.9, gamma=0.1, the training loss decreases and test accuracy goes from 2%-50% in 100 iterations which is quite good.

When using other optimisers such as RMSPROP, ADAM and ADADELTA, the training loss remains almost the same even and no improvement in test accuracy after 1000 iterations.

For RMSPROP, I have changed the respective parameters as mentioned here.

For ADAM, I have changed the respective parameters as mentioned here

For ADADELTA, I have changed the respective parameters as mentioned here

Can someone please tell me what i am doing wrong?

VeilEclipse
  • 2,766
  • 9
  • 35
  • 53
  • I've found that one should use lower learning rates with solvers different from SGD. Howeber, I don't quite know why. – pir Oct 03 '15 at 08:50
  • How much do you lower the learning rate, compared to SGD? – VeilEclipse Oct 03 '15 at 13:56
  • 1
    If I use the same learning rate as with SGD the RMSProp algorithm diverges, whereas it will converge (with a slightly lower acc than my well-tuned SGD) with a learning rate that is 1/3 of the original. However, it might be very problem-specific. – pir Oct 04 '15 at 13:57
  • @VeilEclipse: Do you solve your issue? I aslo met the issue. I used Adam/without Adam give same result. I am using `base_lr` same with SGD – John Apr 25 '17 at 07:59

1 Answers1

2

I saw similar results to pir: Adam would diverge when given the same base_lr that SGD used. When I reduced base_lr to 1/100 of its original value, Adam suddenly converged, and gave good results.

Stu Gla
  • 1,129
  • 12
  • 16
  • Thanks for your point out. It means if `base_lr`: 1e-3 for SGD then `base_lr: 1e-5` for Adam. Is it too small? – John Apr 25 '17 at 08:03
  • I have found that 1e-4 is a good learning rate for Adam. You should also try 1e-3 and 1e-5 on your data set to see if you get good performance – Stu Gla Apr 25 '17 at 23:04
  • In my case, lr_rate for Adam is twice time than SGD. I also tried 50%, 150% , but 200% lr_rate is best for me – John Apr 26 '17 at 03:51