Many of the papers I have read so far have this mentioned "pre-training network could improve computational efficiency in terms of back-propagating errors", and could be achieved using RBMs or Autoencoders.
If I have understood correctly, AutoEncoders work by learning the identity function, and if it has hidden units less than the size of input data, then it also does compression, BUT what does this even have anything to do with improving computational efficiency in propagating error signal backwards? Is it because the weights of the pre trained hidden units does not diverge much from its initial values?
Assuming data scientists who are reading this would by theirselves know already that AutoEncoders take inputs as target values since they are learning identity function, which is regarded as unsupervised learning, but can such method be applied to Convolutional Neural Networks for which the first hidden layer is feature map? Each feature map is created by convolving a learned kernel with a receptive field in the image. This learned kernel, how could this be obtained by pre-training (unsupervised fashion)?