-4

I want to train CaffeNet on the MNIST dataset in Caffe. However, I noticed that after 100 iterations the loss just slightly dropped (from 2.66364 to 2.29882).

However, when I use LeNet on MNIST, the loss goes from 2.41197 to 0.22359, after 100 iterations.

Does this happen because CaffeNet has more layers, and therefore needs more training time to converge? Or is it due to something else? I made sure the solver.prototxt of the nets were the same.

While I know 100 iterations is extremely short (as CaffeNet usually trains for ~300-400k iterations), I find it odd that LeNet is able to get a loss so small, so soon.

apples-oranges
  • 959
  • 2
  • 9
  • 21

1 Answers1

0

I am not familiar with architecture of these nets, but in general there are several possible reasons:

1) One of the nets is really much more complicated

2) One of the nets was trained with a bigger learning rate

3) Or maybe it used a training with momentum while other net didn't use it?

4) Also possible that they both use momentum during training but one of them had the bigger momentum coefficient specified

Really, there are tons of possible explanations for that.

Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70