Best configuration parameters for a CNN image classification model in CNTK

Question

i have a dataset composed by 2 images for observation. The images have shape (1, 128, 118), they are greyscaled images and there are 11 classes to classify for this problem. What's the best to go with a CNN with data like this? How could i optimally define for example the number of layers of my CNN, padding or not, stride shape, how many pooling layers should i use? Is better max pooling or average pooling?

This is the actual configuration of my model:

def create_model(features):
    with C.layers.default_options(init=C.glorot_uniform(), activation=C.ops.relu, pad= True):
            h = features

            h = C.layers.Convolution2D(filter_shape = (5,5),
                                       num_filters=8, strides = (2,2),
                                       pad=True, name = 'first_conv')(h)

            h = C.layers.AveragePooling(filter_shape = (5,5), strides=(2,2))(h)

            h = C.layers.Convolution2D(filter_shape = (5,5), num_filters=16, pad = True)(h)

            h = C.layers.AveragePooling(filter_shape = (5,5), strides=(2,2))(h) 

            h = C.layers.Convolution2D(filter_shape = (5,5), num_filters=32, pad = True)(h)

            h = C.layers.AveragePooling(filter_shape = (5,5), strides=(2,2))(h) 

            h = C.layers.Dense(96)(h) 

            h = C.layers.Dropout(dropout_rate=0.5)(h)

            r = C.layers.Dense(num_output_classes, activation=  None, name='classify')(h)

            return r

z = create_model(x)
# Print the output shapes / parameters of different components
print("Output Shape of the first convolution layer:", z.first_conv.shape)
print("Bias value of the last dense layer:", z.classify.b.value)

I've been experimenting and tweaking the configuration a bit, changing parameter values, adding and removing layers, but my CNN seems not to be learning from my data, it converges to a certain point in the best case, and then it hits a wall and the error stops being reduced.

I have found that the learning_rate and the num_minibatches_to_train parameters are important. I have actually set learning_rate = 0.2 and num_minibatches_to_train = 128 I'm also using sgd as the learner. Here's a sample of my last output results:

Minibatch: 0, Loss: 2.4097, Error: 95.31%
Minibatch: 100, Loss: 2.3449, Error: 95.31%
Minibatch: 200, Loss: 2.3751, Error: 90.62%
Minibatch: 300, Loss: 2.2813, Error: 78.12%
Minibatch: 400, Loss: 2.3478, Error: 84.38%
Minibatch: 500, Loss: 2.3086, Error: 87.50%
Minibatch: 600, Loss: 2.2518, Error: 84.38%
Minibatch: 700, Loss: 2.2797, Error: 82.81%
Minibatch: 800, Loss: 2.3234, Error: 84.38%
Minibatch: 900, Loss: 2.2542, Error: 81.25%
Minibatch: 1000, Loss: 2.2579, Error: 85.94%
Minibatch: 1100, Loss: 2.3469, Error: 85.94%
Minibatch: 1200, Loss: 2.3334, Error: 84.38%
Minibatch: 1300, Loss: 2.3143, Error: 85.94%
Minibatch: 1400, Loss: 2.2934, Error: 92.19%
Minibatch: 1500, Loss: 2.3875, Error: 85.94%
Minibatch: 1600, Loss: 2.2926, Error: 90.62%
Minibatch: 1700, Loss: 2.3220, Error: 87.50%
Minibatch: 1800, Loss: 2.2693, Error: 87.50%
Minibatch: 1900, Loss: 2.2864, Error: 84.38%
Minibatch: 2000, Loss: 2.2678, Error: 79.69%
Minibatch: 2100, Loss: 2.3221, Error: 92.19%
Minibatch: 2200, Loss: 2.2033, Error: 87.50%
Minibatch: 2300, Loss: 2.2493, Error: 87.50%
Minibatch: 2400, Loss: 2.4446, Error: 87.50%
Minibatch: 2500, Loss: 2.2676, Error: 85.94%
Minibatch: 2600, Loss: 2.3562, Error: 85.94%
Minibatch: 2700, Loss: 2.3290, Error: 82.81%
Minibatch: 2800, Loss: 2.3767, Error: 87.50%
Minibatch: 2900, Loss: 2.2684, Error: 76.56%
Minibatch: 3000, Loss: 2.3365, Error: 90.62%
Minibatch: 3100, Loss: 2.3369, Error: 90.62%

Any suggestions to improve my results? I'm open to any hints/exploration.

Thank you in advance

Usually we will recommend that you follow an existing architecture instead of coming up with one. The common ones are resnet, inception nets, densenets, etc. Alternatively, you may want to leverage on pre-trained models instead. You can find it here https://github.com/Microsoft/CNTK/blob/master/PretrainedModels/Image.md — snowflake, Oct 12 '18 at 06:44
To learn how to use pretrained model as a base and do transfer learning for other class of images you can check this out https://github.com/Microsoft/CNTK/tree/master/Examples/Image/TransferLearning — snowflake, Oct 12 '18 at 06:46

snowflake · Answer 1 · 2018-10-13T06:24:14.677

1

Anyway, to answer the question, usually when you are starting out, i recommend that for conv layer keep your filter_shape to (3, 3) and stride should be 1.

For pooling layers, stick to maxpooling until you are better with deep learning. For maxpooling layer, filter_shape=(2, 2) and stride=(2,2)

Normally, you have 2-3 conv layers followed by one maxpooling layer, repeated this sequence until you reduce the dimensions to something easy to work with.

For learner you should use adam. It requires minimum tuning. You can use a learning rate of 1e-3 or 1e-4 for a start. You can set momentum to be 0.9.

FOr minibatch size, keep it to either 16 or 32 for a start.

Also, when you are first attempting to get the model to converge, do it without dropout. Dropout impedes convergence. Once you are sure the model is working, add dropout back in for regularisation.

edited Oct 13 '18 at 06:24

answered Oct 12 '18 at 06:50

snowflake

902
1
6
18

Hi @snowflake, thank you for your answer. I'm sorry i forgot to tell you, actually i'm using an existing architecture, i'm applying [this](https://cntk.ai/pythondocs/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.html) tutorial, applying it to my own data, i was just tweaking and playing with the model parameters and changing some things. But my model is not learning anything. Sometimes it converges and the error reduces, but then i see i'm scoring 0.30 or 0.52 at best. I don't know if i could be overfitting somewhere – Miguel 2488 Oct 12 '18 at 12:32
anyway i used the same data to follow the cntk logistic regression tutorial and i got pretty good scores such as 0.88 on validation, i just can't believe that a simple logistic regression could be more performant than a CNN, i would have expected to make more than 0.95 with CNN. I'll try your suggestions and i'll come back to tell you my results. What about the number of minibatches to train with? What could be a right number of iterations/epochs? By the way the learner i'm using is SGD, i specified it in the question, i wanted to try adam but it requires momentum and i dont know how to use it – Miguel 2488 Oct 12 '18 at 12:37
Momemtum set is as 0.9, minibatch size keep it to either 16 or 32. – snowflake Oct 13 '18 at 03:53
Hi @snowflake, Thanks for your reply. I tried all your suggestions, with poor results unfortunately :/. The best i reached was 0.75 using 2 conv layers and a max pooling layer. If i try only using a single dense layer with nothing more (logistic regression) The results are way better, it seems that as i add layers the predictions get more inaccurate, it has to be something related with the layer's configuration. As for the minibatch_size as far as i know in CNTK as you increase the minibatch_size you gotta decrease learning_rate and viceversa. 0.01 lr don't work with 16/32 minibatch size – Miguel 2488 Oct 13 '18 at 10:16
1

0.01 is 1e-2, use 1e-3 or 1e-4 or 1e-5 for your learning rate. If the training is not converging, drop your learning rate by half or one order of magnitude. For your loss function are you using the cross_entropy_with_softmax? – snowflake Oct 13 '18 at 11:16
Yes i'ts cross_entropy_with_softmax,i'm doing it the same way that is specified in the mnist tutorials. I tried with 0.1, 0.01, 0.001, and 0.0001. And i find that the as i decrease the learning rate convergence requires more time and number of batches to train with, even like that, with such a low learning rate, it may converge well with time, but in the final epochs it seems to reach a wall and it won't go down to zero, also the error rate start to change from low to high and start decreasing again but almost never reaches 0. – Miguel 2488 Oct 13 '18 at 12:11
And another question what do a good convergence look like? The error should decrease slowly? or aggresively? – Miguel 2488 Oct 13 '18 at 12:18
1

It will decrease very quickly at first then flatten. Something like these https://www.ibm.com/blogs/research/wp-content/uploads/2018/02/deep-learning-fig2-768x424.png – snowflake Oct 13 '18 at 12:19
1

Training loss will rarely ever go down to zero unless you overfit/memorise which is something you don't want to do. – snowflake Oct 13 '18 at 12:21
Allright, then a good error rate in the final epochs could be like 2 or 3% during training? Using logistic regression it goes down to zero for a good bunch of epochs at the end and i'm able to score 0.89 in the validation set, not even during testing but validation – Miguel 2488 Oct 13 '18 at 12:24
1

Whether its a good error rate or not really depends on the particular dataset you are dealing with. When you are using conv net, the number of parameters you are using is way way less than using Dense layer only. So if you take into account the number of parameters you use, i'm sure you will find that conv nets are more efficient. – snowflake Oct 13 '18 at 13:24
1

You might want to take a look at this. https://stackoverflow.com/questions/52840888/keras-accuracy-not-increasing-over-50-on-binary-cnn-problem Perhaps you need to reinstall cntk if you continue to not be able to get good results. @Miguel2488 – snowflake Oct 18 '18 at 05:02
Hi @snowflake. Thank you for your suggestions, but believe i know my installl is not the problem, i already resintalled it several times. But i will try the suggestions on the comments :) – Miguel 2488 Oct 18 '18 at 18:30

Best configuration parameters for a CNN image classification model in CNTK

1 Answers1