Good morning, I'm implementing a binary classification model of two exclusive one-hot encoded labels. I finally got to the stage that the program outputs something, but sadly enough it is predicting the same class the whole time. At one time it did not do that, so I was wondering if the problem was that the program very easily decays into a local minimum of the loss function. Or maybe the problem is the saturation of the softmax activation. The variables are initialized with tf.truncated_normal and I have been playing with reducing the standard deviation in case the problem was indeed the saturation of the softmax. The images are RGB. Also, my computer not being the most capable, I'm running batches of 50-100 images (they are reasonably big, 480*704), and a number of epochs of about 20-40.
The model per se is:
with tf.Graph().as_default():
print("Creating Graph [{}]".format(datetime.datetime.now().strftime("%H:%M:%S")))
x = tf.placeholder(tf.float32, [None, 480, 704, 3])
y_true = tf.placeholder(tf.float32, [None, 2])
is_training = tf.placeholder(tf.bool, [])
with tf.name_scope("Conv_layers"):
conv_1 = conv_layer(x, [5, 5, 3, 2])
conv_pool_1 = max_pool_4x4(conv_1)
conv_2 = conv_layer(conv_pool_1, [5, 5, 2, 4])
conv_pool_2 = max_pool_2x2(conv_2)
conv_3 = conv_layer(conv_pool_2, [5, 5, 4, 8])
conv_pool_3 = max_pool_2x2(conv_3)
conv_4 = conv_layer(conv_pool_3, [5, 5, 8, 16])
conv_pool_4 = max_pool_2x2(conv_4)
to_flat = tf.reshape(conv_pool_4, [-1, 22*15*16])
full_1 = full_layer(to_flat, 1024)
y_conv = full_layer(full_1, 2)
y_conv = tf.cond(is_training, lambda: tf.identity(y_conv), lambda: tf.nn.softmax(y_conv))
The loss function and accuracy:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(0.03).minimize(cross_entropy)
correct_pred = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_true, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
The functions used:
def weight_variables(shape):
initializer = tf.truncated_normal(shape=shape, stddev=0.05)
return tf.Variable(initializer)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def max_pool_4x4(x):
return tf.nn.max_pool(x, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')
def conv_layer(input, shape):
W = weight_variables(shape=shape)
b = bias_variable([shape[3]])
return tf.nn.relu(conv2d(input, W))
def full_layer(input, size):
in_size = int(input.get_shape()[1])
W = weight_variables([in_size, size])
b = bias_variable([size])
return tf.matmul(input, W) + b
A part of a typical prediction is:
[[0.28597853 0.71402144]
[0.28610235 0.71389765]
[0.28605604 0.713944 ]
[0.28603107 0.71396893]
[0.28613603 0.7138639 ]
[0.2860006 0.7139994 ]
[0.28612924 0.71387076]
[0.28628975 0.71371025]
[0.28614312 0.7138569 ]
[0.28609362 0.71390635]
[0.28626445 0.7137355 ]
[0.28617397 0.71382606]]
And incrementing the size of the convolutional layers have led my model to output things like:
[[0. 1.]
[0. 1.]
[0. 1.]
[0. 1.]
[0. 1.]
[0. 1.]
[0. 1.]]