1

Good morning, I'm implementing a binary classification model of two exclusive one-hot encoded labels. I finally got to the stage that the program outputs something, but sadly enough it is predicting the same class the whole time. At one time it did not do that, so I was wondering if the problem was that the program very easily decays into a local minimum of the loss function. Or maybe the problem is the saturation of the softmax activation. The variables are initialized with tf.truncated_normal and I have been playing with reducing the standard deviation in case the problem was indeed the saturation of the softmax. The images are RGB. Also, my computer not being the most capable, I'm running batches of 50-100 images (they are reasonably big, 480*704), and a number of epochs of about 20-40.

The model per se is:

with tf.Graph().as_default():
    print("Creating Graph [{}]".format(datetime.datetime.now().strftime("%H:%M:%S")))
    x = tf.placeholder(tf.float32, [None, 480, 704, 3])
    y_true = tf.placeholder(tf.float32, [None, 2])
    is_training = tf.placeholder(tf.bool, [])

    with tf.name_scope("Conv_layers"):
        conv_1 = conv_layer(x, [5, 5, 3, 2])
        conv_pool_1 = max_pool_4x4(conv_1)

        conv_2 = conv_layer(conv_pool_1, [5, 5, 2, 4])
        conv_pool_2 = max_pool_2x2(conv_2)

        conv_3 = conv_layer(conv_pool_2, [5, 5, 4, 8])
        conv_pool_3 = max_pool_2x2(conv_3)

        conv_4 = conv_layer(conv_pool_3, [5, 5, 8, 16])
        conv_pool_4 = max_pool_2x2(conv_4)

    to_flat = tf.reshape(conv_pool_4, [-1, 22*15*16])
    full_1 = full_layer(to_flat, 1024)

    y_conv = full_layer(full_1, 2)
    y_conv = tf.cond(is_training, lambda: tf.identity(y_conv), lambda: tf.nn.softmax(y_conv))

The loss function and accuracy:

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_conv), reduction_indices=[1]))
    train_step = tf.train.AdamOptimizer(0.03).minimize(cross_entropy)
    correct_pred = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_true, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

The functions used:

def weight_variables(shape):
    initializer = tf.truncated_normal(shape=shape, stddev=0.05)
    return tf.Variable(initializer)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

def max_pool_4x4(x):
    return tf.nn.max_pool(x, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')


def conv_layer(input, shape):
    W = weight_variables(shape=shape)
    b = bias_variable([shape[3]])
    return tf.nn.relu(conv2d(input, W))


def full_layer(input, size):
    in_size = int(input.get_shape()[1])
    W = weight_variables([in_size, size])
    b = bias_variable([size])
    return tf.matmul(input, W) + b

A part of a typical prediction is:

[[0.28597853 0.71402144]
 [0.28610235 0.71389765]
 [0.28605604 0.713944  ]
 [0.28603107 0.71396893]
 [0.28613603 0.7138639 ]
 [0.2860006  0.7139994 ]
 [0.28612924 0.71387076]
 [0.28628975 0.71371025]
 [0.28614312 0.7138569 ]
 [0.28609362 0.71390635]
 [0.28626445 0.7137355 ]
 [0.28617397 0.71382606]]

And incrementing the size of the convolutional layers have led my model to output things like:

[[0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]]
  • In such cases, before start thinking about local minima and softmax saturation, the very first thing to check is if your dataset is **imbalanced**, i.e. if one class (the one that gets predicted) is (heavily) overrepresented relative to the other. – desertnaut Jul 31 '19 at 09:32
  • Hello, it was indeed imbalanced at the beginning, but I already fixed that and now each batch is 50%-50% – MarcMiranda Jul 31 '19 at 09:36
  • Why are you computing loss manually? Have you tried [softmax_cross_entropy_with_logits_v2](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2) – thushv89 Jul 31 '19 at 10:15
  • Also it is difficult to say what is going on without looking at the training loss / test accuracy and their behaviors over time. You can try other (better) initializations like [glorot](https://www.tensorflow.org/api_docs/python/tf/glorot_normal_initializer). You can try hyperparameter optimization (e.g. seems your learning rate `0.03` is arbitrary). You can try pretrained models (e.g. AlexNet) and finetuning them. – thushv89 Jul 31 '19 at 10:18

0 Answers0