Setup
I build an autoencoder with Tensorflow for images. My images are around 30 Pixels in length and width. I am using 5 layers:
- Input layer
- Encoder layer with 256 neurons with linear functions. (This layer is supposed to function sort of as a preprocessing PCA.)
- Encoder layer with 128 neurons with sigmoid functions.
- Decoder layer with 256 neurons with sigmoid functions.
- Decoder/output layer with as many neurons as the input with linear function.
All layer use biases and are defined like this
layer_1 = tf.nn.sigmoid(tf.add(
tf.matmul(x, tf.Variable(tf.random_normal([n_input, n_hidden_1]))),
tf.Variable(tf.random_normal([n_hidden_1]))
))
My cost is defined by
cost = tf.reduce_mean(tf.div(tf.reduce_sum(tf.pow(y_true - y_pred, 2)), 2))
and I used RMSPropOptimizer
with start learning rate of 0.01
.
I trained the autoencoder with around 250000 images. In the following I show the results for test data, with which I did not train. The upper row always shows the input images, the lower row always shows the output images.
I used the autoencoder satisfactory on the MNIST data (array cells are values from 0-1):
I used the autoencoder satisfactory on the photos I took myself (array cells are values from 0-100):
Where the root mean square errors
rmse = tf.sqrt(tf.reduce_mean(tf.square(tf.sub(y_true, y_pred)), axis=1))
are as one would expect - higher for the first three photos (darkened, whitened, put a cross in), and lower for the unchanged photos:
17.5, 29.6, 12.9, 11.7, 11.2, 11.7, 7.3, 7.1, 7.1, 8.1
But the autoencoder does not work satisfactory for texture images (array cells are from 0-1). First I used RMSPropOptimizer
and got
Since my cost where pretty high during training and did not change I followed the advice from https://stackoverflow.com/a/40956761/4533188 and used AdamOptimizer
. Indeed I got better results with
While the costs are lower, they are still constant during the epochs:
Also the output images are pretty dark. I believe this might also the reason why my rmse
s are is not as I would expect:
0.4642, 0.2669, 0.4976, 0.4378, 0.4753, 0.4688, 0.4615, 0.4571, 0.4691, 0.4487
Please note, that I would expect that the rmse
s is high for the first image due to the dot in the middle and the second image since it is darkened. I guess the reason the rmse
s are not as hoped, is because the output images are so dark.
Questions
- Why are the costs not decreasing over the epochs and what can I do to get this going?
- Why are the output images so dark and what can I do to get them to be better representations of the input images?
- How do I get the
rmse
s to be as I expect?