I am trying to build a Convolutional AutoEncoder in tensorflow on MNIST. How do get the decoded image in same shape as of original one?

Question

I have written the encoder and decoder functions using layers API. Both are 3 layers deep.

def Enocder(real_img):
    with tf.variable_scope("encoder"):
        conv1 = tf.layers.conv2d(inputs=X, filters=32, kernel_size=[
                                 5, 5], use_bias=True, padding="same", activation=tf.nn.leaky_relu)
        pool1 = tf.layers.max_pooling2d(
            inputs=conv1, pool_size=[2, 2], padding="same", strides=[2, 2])
        conv2 = tf.layers.conv2d(inputs=pool1,  filters=64, kernel_size=[
                                 5, 5], use_bias=True, padding="same", activation=tf.nn.leaky_relu)
        pool2 = tf.layers.max_pooling2d(
            inputs=conv2, pool_size=[2, 2],  padding="same",strides=[2, 2])
        conv3 = tf.layers.conv2d(inputs=pool2, filters=128, kernel_size=[
                                 5, 5], use_bias=True, padding="same", activation=tf.nn.leaky_relu)
        pool3 = tf.layers.max_pooling2d(
            inputs=conv3, pool_size=[2, 2], padding="same", strides=[2, 2])
        return pool3


def Decoder(Z):
    with tf.variable_scope("decoder"):
        deconv1 = tf.layers.conv2d_transpose(inputs=Z, filters=128, kernel_size=[
                                             5, 5], padding="same", strides=[2, 2])
        deconv2 = tf.layers.conv2d_transpose(inputs=deconv1, filters=64, kernel_size=[
                                             5, 5], padding="same", strides=[2, 2])
        deconv3 = tf.layers.conv2d_transpose(inputs=deconv2, filters=32, kernel_size=[
                                             5, 5], padding="same", strides=[2, 2])
        deconv4 = tf.layers.conv2d_transpose(inputs=deconv3, filters=1, kernel_size=[5,5], padding="same", strides=[2,2])

        return deconv4

real_img = tf.placeholder(dtype=tf.float32, shape=[None, 784])
X = tf.reshape(real_img, [-1, 28, 28, 1])
enc = Enocder(X)
dec = Decoder(enc)
cost = tf.reduce_sum(tf.square(X - dec))

Error:

    ValueError: Dimensions must be equal, but are 28 and 24 for 'sub' (op: 'Sub') with input shapes: [?,28,28,1], [?,64,64,1].

How do I get the decoded image in 28x28 shape?

score 1 · Answer 1 · answered Mar 29 '18 at 12:38

1

tf.layers.max_pooling2d also has a padding parameter. Setting this to same as for the convolutions should fix this. Otherwise your pooling will slightly shrink the input (beyond the striding) as seen here.

Note that you will also need your last convolutional transpose layer to only use 1 filter -- right now your reconstructions would be [?, 28, 28, 32] but you need [?, 28, 28, 1] like the input.

answered Mar 29 '18 at 12:38

xdurch0

9,905
4
32
38

I did the suggested changes ,but this seems to increase the size way too much. `ValueError: Dimensions must be equal, but are 28 and 64 for 'sub' (op: 'Sub') with input shapes: [?,28,28,1], [?,64,64,1].` @xdurch0 – Nimish Ronghe Mar 29 '18 at 17:23
Can you update your post with the code as it is right now? – xdurch0 Mar 29 '18 at 21:29
Well now you have one strided transpose layer too much in the decoder. Each strided transpose increases the dimensionality so the downsampling in the encoder needs to match the upsampling in the decoder. – xdurch0 Mar 30 '18 at 19:09

I am trying to build a Convolutional AutoEncoder in tensorflow on MNIST. How do get the decoded image in same shape as of original one?

1 Answers1