CNN model on image classification is not convergent, which is based on Tensorflow

Question

I try to train a CNN model, 2 classes, which is based on tensorflow to do the image classification.

I have tried much modification about epochs, learning rate, batch size and the CNN size, but nothing works.

about data

86(label: 0) + 63(label: 1) images

shape: (128, 128)

about current parameters

learning_rate = 0.00005(I have tried from 0.00000001 to 0.8...)

batch size = 30(I also have tried from 5 to 130)

epoch = 20

about network

def weight_variable(shape):

    initial = tf.truncated_normal(shape, stddev = 0.1, dtype = tf.float32)
    return tf.Variable(initial)


def bias_variable(shape):

    initial = tf.constant(0.1, shape = shape, dtype = tf.float32)
    return tf.Variable(initial)


def conv2d(x, W):

    #(input, filter, strides, padding)
    #[batch, height, width, in_channels]
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):

    #(value, ksize, strides, padding)
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

def cnn_model():

    epochs = 20
    batch_size = 30
    learning_rate = 0.00005
    hidden = 2
    cap_c = 86
    cap_h = 63
    num = cap_c + cap_h
    image_size = 128
    label_size = 2

    print ((num//(batch_size)) * epochs)
    train_loss = np.empty((num//(batch_size)) * epochs)
    train_acc = np.empty((num//(batch_size)) * epochs)

    x = tf.placeholder(tf.float32, shape = [None, image_size, image_size])
    y = tf.placeholder(tf.float32, shape = [None, label_size])

    weight_balance = tf.constant([0.1])

    X_train_ = tf.reshape(x, [-1, image_size, image_size, 1])

    #First layer
    W_conv1 = weight_variable([5, 5, 1, 4])
    b_conv1 = bias_variable([4])

    h_conv1 = tf.nn.relu(conv2d(X_train_, W_conv1) + b_conv1)
    h_pool1 = max_pool_2x2(h_conv1)

#    #Second layer
#    W_conv2 = weight_variable([5, 5, 4, 8])
#    b_conv2 = bias_variable([8])
#    
#    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
#    h_pool2 = max_pool_2x2(h_conv2)
#    
#    Third layer
#    W_conv3 = weight_variable([5, 5, 8, 16])
#    b_conv3 = bias_variable([16])
#    
#    h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
#    h_pool3 = max_pool_2x2(h_conv3)

    #Full connect layer
    W_fc1 = weight_variable([64 * 64 * 4, hidden])
    b_fc1 = bias_variable([hidden])

    h_pool2_flat = tf.reshape(h_pool1, [-1, 64 * 64 * 4])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    #Output_Softmax

    W_fc2 = weight_variable([hidden, label_size])
    b_fc2 = bias_variable([label_size])

    y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

    print y_conv.shape



    #Train
    loss = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(y, y_conv, weight_balance))
    optimize = tf.train.AdamOptimizer(learning_rate).minimize(loss)

    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y, 1)) 
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

about result

The loss is not convergent and also the accuracy.

I don't know if my CNN model is not suitable for my data? or

The Activate function and loss function of the network is not suitable?

Really thank you

Also, you are applying softmax to your output and then your loss function applies it again. Don't do that. Feed un-activated outputs into the loss function and apply softmax only for predictions. — Mad Wombat, Aug 17 '17 at 18:15
@MadWombat Thanks for your help. You mean I should use `out_put = tf.add(tf.matmul(h_fc1_drop, W_fc2), b_fc2)` when I feed output into loss function? And then I still use `tf.nn.softmax_cross_entropy_with_logits` to my loss function? Thank you — Tozz, Aug 17 '17 at 18:24
Yes, that would be a good start. And you can do `pred = tf.nn.softmax(out_put)` and use that to generate your actual predictions. — Mad Wombat, Aug 17 '17 at 18:25
@MadWombat Yes, I have modified it. But the result is still not convergent. Could you help me to have a look at my network, and if I have something that is unsuitable with my data set or I have other problems? — Tozz, Aug 17 '17 at 18:34
When you say that your network doesn't converge, what exactly do you mean? Does your loss go down during training at all? Does it fluctuate? Does it go up? What accuracy do you get from a few epochs? — Mad Wombat, Aug 17 '17 at 20:48
@MadWombat Sorry, I didn’t explain clearly. After trying a larger batch_size (about 130), the loss will always converge to the same value, 0.4000. But the whole trend goes down. However, about the accuracy, it always fluctuate. (By the way, there is also something I don’t understand: when I train the model, there are three times that the training process is prefect, the loss goes down well and the accuracy goes up well. And the final test accuracy is about 0.9 in those three times. However, it didn’t last long...) — Tozz, Aug 17 '17 at 21:00
You have a total of 149 images in your training dataset, yet you are training on a batch size of 150. That means you are using your whole dataset on each training iteration. There are multiple problems here, but the main one is that you are almost forcing the network to overfit by using the same batch over and over again. — Mad Wombat, Aug 17 '17 at 21:08
Try a few of these things. 1. Reduce your batch size to some number around 5-10. 2. Train for more epochs with the smaller batch size. 3. See if your network converges better without the dropout (not likely, but who knows) 4. Adam is a very complex algorithm, see if you get better results with a simpler optimizer (try GradientDescentOptimizer first). — Mad Wombat, Aug 17 '17 at 21:11
Also, introduce data shuffling and batch normalization into your model. — Mad Wombat, Aug 17 '17 at 21:11
@MadWombat Thanks. I use your methods and the loss could go down well(I set epochs = 42, batch_size = 20, learning_rate = 0.05). But the accuracy always fluctuates Violently...I think if something wrong during the training process? SInce the predicted value is not very correct(all 1 or all 0 in every batch) — Tozz, Aug 18 '17 at 04:39

score 1 · Answer 1 · answered Aug 17 '17 at 20:30

1

There are couple of problems with the code:

You are applying softmax on the last layer and then calling tf.nn.weighted_cross_entropy_with_logits which in turn applies sigmoid activation, so you are applying activation twice.
For initialisation of the weights, use Xavier or Variance_scaling for faster convergence. Better to use tf.layers API in implementing your model as its default settings follows best practices.

answered Aug 17 '17 at 20:30

Vijay Mariappan

16,921
3
40
59

Thanks for your answer :-) I have changed: `loss = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(y, out_feed, weight_balance))` and `optimize = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)`. But the problem now is that loss converge well but accuracy fluctuates violently... – Tozz Aug 18 '17 at 05:03

CNN model on image classification is not convergent, which is based on Tensorflow

about data

about current parameters

about network

about result

1 Answers1