0

Unfortunately, I can't use --shuffle while creating LMDB.

So, I was advised to shuffle train.txt before creating LMDB.

After shuffling train.txt looks like this

n07747607/n07747607_28410.JPEG 950
n02111277/n02111277_55668.JPEG 256
n02091831/n02091831_4757.JPEG 176
n04599235/n04599235_10544.JPEG 911
n03240683/n03240683_14669.JPEG 540

After creating LMDBs for TEST and TRAIN I'm trying to train caffe on bvlc_reference_caffenet.

Only one problem, after more that 10 thousand iterations accuracy = 0.001 and loss = 6.9. Which, as I understand, means that it doesn't train and is just guessing.

Can you point out what I'm doing wrong? Thank you.

Aleksander Monk
  • 2,787
  • 2
  • 18
  • 31
  • Not from this alone, I'm afraid. Even without shuffling at all, you should get better results than this. What data set and batch size are you using? Most notably, how many epochs have you run with 10K iterations? If this is on the full ILSVRC2012 data set, then you've run 2 epochs, and I *think* you should be seeing better numbers than you have. – Prune Mar 08 '17 at 01:55
  • it seems like your list is shuffled correctly. Are you sure this file was used to create the lmdb? what solver `type` are you using? – Shai Mar 08 '17 at 08:08
  • @Shai I didn't change it, so it's SGD. – Aleksander Monk Mar 08 '17 at 18:37
  • @Prune It's on full ILSVRC2012 and batch size is 256 – Aleksander Monk Mar 08 '17 at 18:37
  • Rats. This file is a modified AlexNet, I think? I'm running out of ideas. Other factors that might help someone: how many nodes (assuming one, since you didn't specify). GPU or CPU? I did a little research last night, and it looks like the AlexNet family, in general, show progress on the loss function, even through the first epoch. From what I found, we can expect the loss function to be in the low 5+ range after two epochs. – Prune Mar 08 '17 at 19:03
  • Oh ... is your solver file still intact? getting the hyper-parameters wrong can ruin the training in a number of ways. The simplest is if your learning rate somehow got set to 0. Also, if you have aggressive hyper-parameters and random initialization of weights and parameters, it's possible that a bad run of random numbers can give you a brain-dead training. – Prune Mar 08 '17 at 19:06
  • have you tried to set debug_info: true ? – Shai Mar 08 '17 at 21:09
  • What about using a larger LR while training? – Anoop K. Prabhu Mar 09 '17 at 12:12

0 Answers0