dlib network doesn't converge

Question

I'm working on character recognition using dlib on linux. When I train my network on 1000 characters the network converges until it reaches 100% of accuracy but when I try training it with 10,000 or 100,000 characters it is not converging anymore to 100% accuracy. My learning rate is still decreasing but the amount of wrong prediction is not changing. The network I'm using is the following

using net_type = loss_multiclass_log<
        dlib::fc<nbClassConst,
        dlib::relu<dlib::fc<120,
        dlib::relu<dlib::fc<400,
        dlib::max_pool<2, 2, 2, 2, dlib::relu<con<16, 5, 5, 1, 1,
        dlib::max_pool<2, 2, 2, 2, dlib::relu<con<6, 5, 5, 1, 1,
        dlib::input<matrix<unsigned char>>
        >>>>>>>>>>>>;

It is based on the dnn_introduction2_ex.cpp example from dlib. I don't know if I have to play with some parameters of the network or if I this network is just not adapted to what I want to do. I would appreciate any suggestion or help.

You mean the *loss* is still decreasing, but cross-validation does not reflect generalization-progress? Are these new samples somewhat different? Did you try random-sampling 1000 chars from this 100k set and learn on this (to make sure the first 1k is not easy compared to the rest)? Usually, learning-rate + decay params which work on a subset of the data also work on the full data (even in theory). — sascha, Jan 16 '17 at 14:56
Thank you for replying. I can check the character images that are not learned properly and they are really well defined. The learning rate is always decreasing until 0 and the average loss is not changing at some point. I didn't play with the decay param. Do I have to modify my network structure if the amount of class is different? — landa, Jan 16 '17 at 15:19
Wait? The number of possible target-classes changes with the bigger dataset? That changes things of course. Parameter-tuning becomes more relevant again, and probably also NN-arch (especially regulization). — sascha, Jan 16 '17 at 15:22
It means that I have to completely change my network architecture? (sorry I'm new in NN) In this case, how to define the NN architecture for a sample of 200,000 images and 79 classes? and what is the best NN-arch for this case? — landa, Jan 16 '17 at 15:28
Nobody knows. It's always dependent on the task/data and the theory of designing these archs is not well understood in current research. Grab a tutorial for some rule of thumbs like how to diagnose and where to change something depending on various observations. — sascha, Jan 16 '17 at 15:32
Ok so I have to play with my NN parameters and some other NN-arch in order to find the good one... thank you for your help. — landa, Jan 16 '17 at 15:34

dlib network doesn't converge

0 Answers0