0

I am attempting to build my first Autoencoder neural net using TensorFlow. The dimensions of the layers in the encoder and decoder are the same, just reversed. The autoencoder learns to compress and reconstruct image data to a reasonable standard, but I would like to try to improve its performance by instead having the decoder as the exact transpose of the encoder.

I am lost with how to do this in TensorFlow.

Here is a snippet of the construction of my network:

imgW, imgH = 28, 28
encoderDims = [
    imgW * imgH,
    (imgW // 2) * (imgH // 2),
    (imgW // 3) * (imgH // 3),
    (imgW // 4) * (imgH // 4)
]
decoderDims = list(reversed(encoderDims))

encoderWeights, encoderBiases = [], []
decoderWeights, decoderBiases = [], []
for layer in range(len(encoderDims) - 1):
    encoderWeights.append(
        tf.Variable(tf.random_normal([encoderDims[layer], encoderDims[layer + 1]]))
    )
    encoderBiases.append(
        tf.Variable(tf.random_normal([encoderDims[layer + 1]]))
    )
    decoderWeights.append(
        tf.Variable(tf.random_normal([decoderDims[layer], decoderDims[layer + 1]]))
    )
    decoderBiases.append(
        tf.Variable(tf.random_normal([decoderDims[layer + 1]]))
    )

input = tf.placeholder(tf.float32, [None, imgW * imgH])
encoded = input
for layer in range(len(encoderDims) - 1):
    encoded = tf.add(tf.matmul(encoded, encoderWeights[layer]), encoderBiases[layer])
    encoded = tf.nn.sigmoid(encoded)

decoded = encoded
for layer in range(len(decoderDims) - 1):
    decoded = tf.add(tf.matmul(decoded, decoderWeights[layer]), decoderBiases[layer])
    if layer != len(decoderDims) - 2:
        decoded = tf.nn.sigmoid(decoded)

loss = tf.losses.mean_squared_error(labels=input, predictions=decoded)
train = tf.train.AdamOptimizer(learningRate).minimize(loss)

The two issues I do not know how to overcome are:

  1. How can I adjust only the encoder parameters during training with respect to the loss?
  2. How can I create the decoder weights and biases in such a way that after each iteration of training of the encoder parameters, they are set as the transpose of the newly adjusted encoder parameters?
KOB
  • 4,084
  • 9
  • 44
  • 88

1 Answers1

0

I doubt that this will outperform regular autoencoders. But if you should get surprisingly good results, let the community know. Regarding your questions:

1.) As you have to use some reconstruction error between input and output, the only option is to train with the whole network (e.g. encoder and decoder as a whole). But, you could set a flag for the decoder variables, that prevents them from getting altered by the algorithm after getting initialized. Set them to trainable=False. After training for an epoch, you can set them manually to the transposed encoder weights.

2.) Here, I am not sure how you interpret 'transpose'. If you mean that the weights of layer 1 of the encoder should match those of the last layer of the decoder, you can try this:

for layer in range(len(encoderWeights)):
    decoderWeights[-layer-1] = tf.transpose(encoderWeights[layer])

If you want to transpose the layer matrices individually, you can use said tf.tranpose(). From a mathematical standpoint, the correct reverse operation of a matrix multiplication would be the inverse matrix, if it is defined. TensorFlow does provide tf.matrix_inverse() for this. However, be very careful when using this as a reasonable result is not guaranteed.

DocDriven
  • 3,726
  • 6
  • 24
  • 53
  • "I doubt that this will outperform regular autoencoders." This paper (https://arxiv.org/pdf/1708.01715.pdf) which outlines successful results of using a deep autoencoder model for a recommender system (which is what I am attempting too) said that this is what they did in the last paragraoh of page 1. Also the original paper by Geoffrey Hinton proposing autoencoders (https://www.cs.toronto.edu/~hinton/science.pdf) used this transposition of the weight matrices from encoder to decoder, too, followed by further fine-tuning of the encoder and decoder weights independent from each other. – KOB Jul 05 '18 at 10:32
  • With regards to your second point: Yes it is just a simple mathematical transpose of the matrix that is needed since the first weight matrix of the encode would be of size, for example, 728x400, and hence the last weight matrix of the decoder would now need to be the transpose of this, 400x728. – KOB Jul 05 '18 at 10:36
  • And on your 1st point, I think an approach similar to this could be used. After reading your point, I think I could just use the encoder on each iteration, calculate the loss by then taking the transpose of each encoding weight matrix and hence simulate the decoding process, from which the loss can be computed. Next, from this loss I could train just the encoder weights as you outlined. This would be repeated until the loss is sufficiently low and I could then construct the permanent decoder through transposing the trained encoder weights and save the whole model. – KOB Jul 05 '18 at 10:40
  • Regarding the first statement, this was solely based on my intuition. I haven't read the papers yet, but I'll definitely do that. Thank you for making me aware of that. Regarding your response to point 1.): that is a good in-detail description of how I would approach this. I just don't know what you mean by iteration. Do you intend to copy the encoder weights after training with each batch or after finishing an epoch? Regarding point 2.): I updated my answer as you specified your desired operation. – DocDriven Jul 05 '18 at 11:09
  • Yes, sorry by iteration I simply meant each forward pass of a batch. THe current imput would be fed through the encoder, and then I would multiply the encoded layer through the transposes of the encoder weight matrices (simulating the decoder) to be able to calculate the loss, and then adjust the encoder weights accordingly before processing the next batch. I have got it working, but it doesn't seem to work well without a bias and now I don't think it is at all possible to achieve this "mirroring" of the encoder to the decoder when biases are involved. – KOB Jul 05 '18 at 12:31
  • For example, if the weights of the first layer of the encoder was a 784x400 matrix, then the bias here would be a list of 400 weight values. However, in the decoder the final layer would have weights of 400x784, and so this final bias would need to be of size 784 - which does not exist in the encoder. – KOB Jul 05 '18 at 12:31