Test accuracy cannot improve when learning ZFNet on ILSVRC12

Question

I've implemented a home-brewed ZFNet (prototxt) for my research. After 20k iterations with the definition, the test accuracy stays at ~0.001 (i.e., 1/1000), the test loss at ~6.9, and training loss at ~6.9, which seems that the net keeps playing guessing games among the 1k classes. I've thoroughly checked the whole definition and tried to change some of the hyper-parameters to start a new training, but of no avail, same results' shown on the screen....

Could anyone show me some light? Thanks in advance!

The hyper-parameters in the prototxt are derived from the paper [1]. All the inputs and outputs of the layers seems correct as Fig. 3 in the paper suggests.

The tweaks are:

crop-s of the input for both training and testing are set to 225 instead of 224 as discussed in #33;
one-pixel zero paddings for conv3, conv4, and conv5 to make the sizes of the blobs consistent [1];
filler types for all learnable layers changed from constant in [1] to gaussian with std: 0.01;
weight_decay: changing from 0.0005 to 0.00025 as suggested by @sergeyk in PR #33;

[1] Zeiler, M. and Fergus, R. Visualizing and Understanding Convolutional Networks, ECCV 2014.

and for the poor part..., I pasted it here

Thanks for the formatting! @Shai – stoneyang Sep 25 '16 at 13:02 — stoneyang, Sep 25 '16 at 13:02

score 1 · Accepted Answer · answered Sep 25 '16 at 11:39

1

A few suggestions:

Change initialization from gauss to xavier.
Work with "PReLU" acitvations, instead of "ReLU". once your net converges you can finetune to remove them.
Try reducing base_lr by an order of magnitude (or even two orders).

answered Sep 25 '16 at 11:39

Shai

111,146
38
238
371

1

Thanks for your suggestions. I will change the params to launch new experiments and feedback will be added then. @Shai – stoneyang Sep 25 '16 at 13:08
1

Just dropping `base_lr` from `0.01` to `0.001`, `accuracy` was improved to above `0.47` after about 122k iterations. I will return to report the final results after 700k iterations. @Shai – stoneyang Sep 26 '16 at 12:19
1

I raised `base_lr` from `0.001` to `0.005`, ie, just a half of the `base_lr` in my initial settings, the test accuracy is `60.x %` after `700k` iterations, which is nearly 3 points higher than Caffenet or AlexNet replicated in official caffe. One of your suggestions saved my day and the rest may be helpful in generic training tasks. @Shai – stoneyang Oct 03 '16 at 14:50

Test accuracy cannot improve when learning ZFNet on ILSVRC12

1 Answers1

Linked