The number of hidden nodes in autoencoder with small number of features

Question

I have a data set which have 2 features and 10000 samples. I would like to convert(integrate) these two features into one feature, for further analysis. So I want to use feature extraction method. As the relationship between two features are not linear, I want to use methods other than conventional PCA.

Because the number of samples are much larger than that of features, I think autoencoder is a good way for feature extraction. But the input feature is only 2, then the shape of autoencoder will be only 2-1-2, which is a linear extraction.

Is it possible to set hidden nodes more than the number of inputs and make stacked autoencoder, such as 2-16-8-1-8-16-2 nodes?

Also, it a good choice to use autoencoder for this kind of data integration? If not, are there any better solutions?

Questions asking for guidance on designing and training neural networks are off-topic for Stack Overflow, unless when addressing implementation details, which does not seem to be the case. Also note that asking for sample code does not make a good question either. If you need assistance in the theoretical background behind autoencoders, see [Cross Validated](https://stats.stackexchange). If you stumble upon a concrete issue while implementing the model with TensorFlow, then writing a [MCVE] is important for us to understand the question. — E_net4, Jul 21 '17 at 10:04
Thank you for your comment. I agree that it's a proper question for cross validated, because the design of the autoencoder is the main issue. I will move this to cross validated. — ToBeSpecific, Jul 22 '17 at 13:36

score 0 · Accepted Answer · answered Jul 21 '17 at 09:50

0

Why would this be a linear extraction? If you use any non-regularity in the hidden and output layer you will get a non-linear relationship between them. Your encoding will in essential be sigmoid(Ax + b).

If you truly want to make your network more complex I would suggest using multiple 2 neuron layers before the single neuron layer. So something like this 2 - 2 - 2 - 1 - 2 - 2 - 2 nodes. I do not see any reason why you would need to make it larger.

answered Jul 21 '17 at 09:50

Thomas Pinetz

6,948
2
27
46

But I think using single hidden layer with single node will produce the result similar to linear PCA, if there are relationship between two input nodes. Also, I think simply adding hidden layers, such as 2-2-1-2-2 will not change the result, because the deeper hidden layer may choose the same weights(0.5 and 0.5) for each input. – ToBeSpecific Jul 22 '17 at 13:41
But by using a non-linearity as activation function you will not get a linear relationship, but a non-linear one. If you want to have a more complex function use mutliple sigmoids. I do not see the benefit of first blowing the information up and then scaling down. If you are afraid that you do not need to use a deep network use skip connections. – Thomas Pinetz Jul 22 '17 at 14:04
Thank you for your answer. Then do you think there are other better options for this kind of work, except autoencoder? In fact I tried t-SNE and PCA with kernel before, but the performance of autoencoder was much better. – ToBeSpecific Jul 22 '17 at 14:44
Dimensionality reduction is a branch on its own. I have had success in learning a representation based on the labels of the data using metric learning. This naturally does not work as well with a regression problem. There are other methods you could try but I do no think they will yield better results. Anyways an evaluation work for such techniques is called: Perception-based evaluation of projection methods for multidimensional data visualization. Maybe one of the other methods works for your use-case. – Thomas Pinetz Jul 22 '17 at 15:21
Thank you for your advice. May I ask one more question? My final goal is to 'integrate'(or merge) two datasets into one to make further analysis, such as univariate feature selection like pearson correlation easier. Do you think the dimensionality reduction techniques are suitable for this work? – ToBeSpecific Jul 22 '17 at 20:42
If you want to make your dataset smaller than there is either dimensionality reduction techniques or manual sorting. Without samples its hard to tell, but they were invented for similar use-cases so I would tend to say that they are suitable. – Thomas Pinetz Jul 24 '17 at 20:51
Thanks Thomas for your advice. – ToBeSpecific Jul 25 '17 at 04:27

The number of hidden nodes in autoencoder with small number of features

1 Answers1