keras-tensorflow CAE dimension mismatch

Question

I'm basically following this guide to build convolutional autoencoder with tensorflow backend. The main difference to the guide is that my data is 257x257 grayscale images. The following code:

TRAIN_FOLDER = 'data/OIRDS_gray/'
EPOCHS = 10
SHAPE = (257,257,1)

FILELIST = os.listdir(TRAIN_FOLDER)

def loadTrainData():
    train_data = []
    for fn in FILELIST:
        img = misc.imread(TRAIN_FOLDER + fn)
        img = np.reshape(img,(len(img[0,:]), len(img[:,0]), SHAPE[2]))
        if img.shape != SHAPE:
            print "image shape mismatch!"
            print "Expected: " 
            print SHAPE 
            print "but got:"
            print img.shape
            sys.exit()
        train_data.append (img)
    train_data = np.array(train_data)
    train_data = train_data.astype('float32')/ 255

    return np.array(train_data)

def createModel():
    input_img = Input(shape=SHAPE)
    x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
    encoded = MaxPooling2D((2, 2), padding='same')(x)

    x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
    x = UpSampling2D((2, 2))(x)  
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    decoded = Conv2D(1, (3, 3), activation='sigmoid',padding='same')(x)
    return Model(input_img, decoded)


x_train = loadTrainData()
autoencoder = createModel()
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

print x_train.shape
autoencoder.summary()

# Run the network
autoencoder.fit(x_train, x_train,
                epochs=EPOCHS,
                batch_size=128,
                shuffle=True)

gives me a error: ValueError: Error when checking target: expected conv2d_7 to have shape (None, 260, 260, 1) but got array with shape (859, 257, 257, 1)

As you can see this is not the standard problem with theano/tensorflow backend dim ordering, but something else. I checked that my data is what it's supposed to be with print x_train.shape:

(859, 257, 257, 1)

And I also run autoencoder.summary():

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 257, 257, 1)       0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 257, 257, 16)      160
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 129, 129, 16)      0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 129, 129, 8)       1160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 65, 65, 8)         0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 65, 65, 8)         584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 33, 33, 8)         0
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 33, 33, 8)         584
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 66, 66, 8)         0
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 66, 66, 8)         584
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 132, 132, 8)       0
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 132, 132, 16)      1168
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 264, 264, 16)      0
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 264, 264, 1)       145
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________

Now I'm not exactly sure where the problem is, but it does look like things go wrong around conv2d_6 (Param # too high). I do know how CAE's work on principle, but I'm not that familiar with the exact technical details yet and I have tried to solve this mainly by messing with deconvolution padding (instead of same, using valid). The closes I got to dims matching was (None, 258, 258, 1). I achieved this by blindly trying different combinations of padding on deconvolution side, not really a smart way to solve a problem...

At this point I'm at a loss, and any help would be appreciated

score 1 · Accepted Answer · answered Sep 20 '17 at 12:44

Since your input and output data are the same, your final output shape should be the same as the input shape.

The last convolutional layer should have shape (None, 257,257,1).

The problem is happening because you have an odd number as the sizes of the images (257).

When you apply MaxPooling, it should divide the number by two, so it chooses rounding either up or down (it's going up, see the 129, coming from 257/2 = 128.5)

Later, when you do UpSampling, the model doesn't know the current dimensions were rounded, it simply doubles the value. This happening in sequence is adding 7 pixels to the final result.

You could try either cropping the result or padding the input.

I usually work with images of compatible sizes. If you have 3 MaxPooling layers, your size should be a multiple of 2³. The answer is 264.

Padding the input data directly:

x_train = numpy.lib.pad(x_train,((0,0),(3,4),(3,4),(0,0)),mode='constant')

This will require that SHAPE=(264,264,1)

Padding inside the model:

import keras.backend as K

input_img = Input(shape=SHAPE)
x = Lambda(lambda x: K.spatial_2d_padding(x, padding=((3, 4), (3, 4))), output_shape=(264,264,1))(input_img)

Cropping the results:

This will be required in any case where you do not change the actual data (numpy array) directly.

decoded = Lambda(lambda x: x[:,3:-4,3:-4,:], output_shape=SHAPE)(x)

Thank you very much, this cleared up any problems I had. Would have taken some time for me to realize on my own by manually debugging, though quite clear in hindsight. — jfp, Sep 21 '17 at 06:50

keras-tensorflow CAE dimension mismatch

1 Answers1