0

I am training a deep autoencoder (for now 5 layers encoding and 5 layers decoding, using leaky ReLu) to reduce the dimensionality of the data from about 2000 dims to 2. I can train my model on 10k data, and the outcome is acceptable. The problem arises when I am using bigger data (50k to 1M). Using the same model with the same optimizer and drop out etc does not work and the training gets stuck after a few epochs. I am trying to do some hyper-parameter search on the optimizer (I am using adam), but I am not sure if this will solve the problem.

Should I look for something else to change/check? Does the batch size matter in this case? Should I solve the problem by fine tuning the optimizer? Shoul I play with the dropout ratio? ...

Any advice is very much appreciated.

p.s. I am using Keras. It is very convenient. If you do not know about it, then check it out: http://keras.io/

icedwater
  • 4,701
  • 3
  • 35
  • 50
Mos
  • 11
  • 2

2 Answers2

0

I would have the following questions when trying to find a cause of the problem:

1) What happens if you change the size of the middle layer from 2 to something bigger? Does it improve the performance of the model trained on >50k training set?

2) Are 10k training examples and test examples randomly selected from 1M dataset?

My guess is that your training model is simply not able to decompress your 50K-1M data using just 2 dimensions in the middle layer. So, it's easier for the model to fit their params for 10k data, activations from middle layer are more sensible in that case, but for >50k data activations are random noise.

  • Thanks! All these tips are useful. I tested different 10k subsamples, and they do not make any difference, so I presume that the samples are chosen fairly. As for the encoding layer size, increasing the size does not help. I tried to increase it to 10 instead of 2, and in this case neither 10k nor 50k get trained correctly. – Mos Jun 16 '16 at 12:21
0

After some investigation, I have realized that the layer configuration I am using is somehow ill for the problem, and this seems to cause -at least parts of the- problem.

I have been using sequence of layers for encoding and decoding. The layer sizes where chosen to decrease linearly, for example:

input: 1764 (dims)

hidden1: 1176

hidden2: 588

encoded: 2

hidden3: 588

hidden4: 1176

output: 1764 (same as input)

However this seems to work only occasionally and it is sensitive to the choice of hyper parameters.

I tried to replace this with an exponentially decreasing layer size (for encoding) and the other way for decoding. so: 1764, 128, 16, 2, 16, 128, 1764

Now in this case the training seems to be happening more robustly. I still have to make a hyper parameter search to see if this one is sensitive or not, but a few manual trials seems to show its robustness.

I will post an update if I encounter some other interesting points.

Mos
  • 11
  • 2