I want to train CaffeNet on the MNIST dataset in Caffe. However, I noticed that after 100
iterations the loss just slightly dropped (from 2.66364
to 2.29882
).
However, when I use LeNet on MNIST, the loss goes from 2.41197
to 0.22359
, after 100
iterations.
Does this happen because CaffeNet has more layers, and therefore needs more training time to converge? Or is it due to something else? I made sure the solver.prototxt of the nets were the same.
While I know 100 iterations is extremely short (as CaffeNet usually trains for ~300-400k iterations), I find it odd that LeNet is able to get a loss so small, so soon.