Deep Learning implementation in Tensorflow or Keras give drastic different results

Question

Context: I'm using a fully convolutional network to perform image segmentation. Typically, the input is an RGB image shape = [512, 256] and the target is a 2 channels binary mask defining the annotated regions (2nd channel is the opposite of the fist channel).

Question: I have the same CNN implementation using Tensorflow and Keras. But the Tensorflow model doesn't start learning. Actually, the loss even grows with the number of epochs! What is wrong in this Tensorflow implementation that prevents it from learning?

Setup: The dataset is split into 3 subsets: training (78%), testing (8%) and validation (14%) sets which are fed to the network by batches of 8 images. The graphs show the evolution of the loss for each subsets. The images show the prediction after 10 epoch for two different images.

Tensorflow implementation and results

import tensorflow as tf

tf.reset_default_graph()
x = inputs = tf.placeholder(tf.float32, shape=[None, shape[1], shape[0], 3])
targets = tf.placeholder(tf.float32, shape=[None, shape[1], shape[0], 2])

for d in range(4):
    x = tf.layers.conv2d(x, filters=np.exp2(d+4), kernel_size=[3,3], strides=[1,1], padding="SAME", activation=tf.nn.relu)
    x = tf.layers.max_pooling2d(x, strides=[2,2], pool_size=[2,2], padding="SAME")
    
x = tf.layers.conv2d(x, filters=2, kernel_size=[1,1])
logits = tf.image.resize_images(x, [shape[1], shape[0]], align_corners=True)
prediction = tf.nn.softmax(logits)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets, logits=logits))
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001).minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

def run(mode, x_batch, y_batch):
    if mode == 'TRAIN':
        return sess.run([loss, optimizer], feed_dict={inputs: x_batch, targets: y_batch})
    else:
        return sess.run([loss, prediction], feed_dict={inputs: x_batch, targets: y_batch})

Keras implementation and reslults

import keras as ke

ke.backend.clear_session()
x = inputs = ke.layers.Input(shape=[shape[1], shape[0], 3])

for d in range(4):
    x = ke.layers.Conv2D(int(np.exp2(d+4)), [3,3], padding="SAME", activation="relu")(x)
    x = ke.layers.MaxPool2D(padding="SAME")(x)

x = ke.layers.Conv2D(2, [1,1], padding="SAME")(x)
logits = ke.layers.Lambda(lambda x: ke.backend.tf.image.resize_images(x, [shape[1], shape[0]], align_corners=True))(x)
prediction = ke.layers.Activation('softmax')(logits)

model = ke.models.Model(inputs=inputs, outputs=prediction)
model.compile(optimizer="rmsprop", loss="categorical_crossentropy")

def run(mode, x_batch, y_batch):
    if mode == 'TRAIN':
        loss = model.train_on_batch(x=x_batch, y=y_batch)
        return loss, None
    else:
        loss = model.evaluate(x=x_batch, y=y_batch, batch_size=None, verbose=0)
        prediction = model.predict(x=x_batch, batch_size=None)
        return loss, prediction

There must be a difference between the two but my understanding of the documentation lead me nowhere. I would be really interested to know where the difference lies. Thanks in advance!

I dont understand what those plots are. But the only discrepancy I find between your 2 versions is the TF one uses same padding for pooling layers which is unusual and probably the source of error. — Priyatham, Mar 29 '18 at 17:01
Thanks but it doesn't make the difference. I completed the question with more details, and gave results with the "same" padding in both TF and K. — Jav, Mar 30 '18 at 08:20

score 2 · Accepted Answer · answered Mar 30 '18 at 13:05

The answer was in the Keras implementation of softmax where they subtract an unexpected max:

def softmax(x, axis=-1):
    # when x is a 2 dimensional tensor
    e = K.exp(x - K.max(x, axis=axis, keepdims=True))
    s = K.sum(e, axis=axis, keepdims=True)
    return e / s

Here is the Tensorflow implementation updated with the max hack and the good results associated

import tensorflow as tf

tf.reset_default_graph()
x = inputs = tf.placeholder(tf.float32, shape=[None, shape[1], shape[0], 3])
targets = tf.placeholder(tf.float32, shape=[None, shape[1], shape[0], 2])

for d in range(4):
    x = tf.layers.conv2d(x, filters=np.exp2(d+4), kernel_size=[3,3], strides=[1,1], padding="SAME", activation=tf.nn.relu)
    x = tf.layers.max_pooling2d(x, strides=[2,2], pool_size=[2,2], padding="SAME")

x = tf.layers.conv2d(x, filters=2, kernel_size=[1,1])
logits = tf.image.resize_images(x, [shape[1], shape[0]], align_corners=True)
# The misterious hack took from Keras
logits = logits - tf.expand_dims(tf.reduce_max(logits, axis=-1), -1)
prediction = tf.nn.softmax(logits)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets, logits=logits))
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001).minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

def run(mode, x_batch, y_batch):
    if mode == 'TRAIN':
        return sess.run([loss, optimizer], feed_dict={inputs: x_batch, targets: y_batch})
    else:
        return sess.run([loss, prediction], feed_dict={inputs: x_batch, targets: y_batch})

Huge thanks to Simon for pointing this out on the Keras implementation :-)

Deep Learning implementation in Tensorflow or Keras give drastic different results

1 Answers1