I'm basically following this guide to build convolutional autoencoder with tensorflow backend. The main difference to the guide is that my data is 257x257 grayscale images. The following code:
TRAIN_FOLDER = 'data/OIRDS_gray/'
EPOCHS = 10
SHAPE = (257,257,1)
FILELIST = os.listdir(TRAIN_FOLDER)
def loadTrainData():
train_data = []
for fn in FILELIST:
img = misc.imread(TRAIN_FOLDER + fn)
img = np.reshape(img,(len(img[0,:]), len(img[:,0]), SHAPE[2]))
if img.shape != SHAPE:
print "image shape mismatch!"
print "Expected: "
print SHAPE
print "but got:"
print img.shape
sys.exit()
train_data.append (img)
train_data = np.array(train_data)
train_data = train_data.astype('float32')/ 255
return np.array(train_data)
def createModel():
input_img = Input(shape=SHAPE)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid',padding='same')(x)
return Model(input_img, decoded)
x_train = loadTrainData()
autoencoder = createModel()
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
print x_train.shape
autoencoder.summary()
# Run the network
autoencoder.fit(x_train, x_train,
epochs=EPOCHS,
batch_size=128,
shuffle=True)
gives me a error:
ValueError: Error when checking target: expected conv2d_7 to have shape (None, 260, 260, 1) but got array with shape (859, 257, 257, 1)
As you can see this is not the standard problem with theano/tensorflow backend dim ordering, but something else. I checked that my data is what it's supposed to be with print x_train.shape
:
(859, 257, 257, 1)
And I also run autoencoder.summary()
:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 257, 257, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 257, 257, 16) 160
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 129, 129, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 129, 129, 8) 1160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 65, 65, 8) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 65, 65, 8) 584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 33, 33, 8) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 33, 33, 8) 584
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 66, 66, 8) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 66, 66, 8) 584
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 132, 132, 8) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 132, 132, 16) 1168
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 264, 264, 16) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 264, 264, 1) 145
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
Now I'm not exactly sure where the problem is, but it does look like things go wrong around conv2d_6 (Param # too high). I do know how CAE's work on principle, but I'm not that familiar with the exact technical details yet and I have tried to solve this mainly by messing with deconvolution padding (instead of same, using valid). The closes I got to dims matching was (None, 258, 258, 1)
. I achieved this by blindly trying different combinations of padding on deconvolution side, not really a smart way to solve a problem...
At this point I'm at a loss, and any help would be appreciated